## Introduction
This notebook runs all the processing steps one by one for several models and renders the output. Each section is individually runnable after a kernel restart 

## Observations
* Symbolic tracing did not play well with any BERT model, because it creates proxies for mutually exclusive inputs to e.g. `DistilBertModel.forward`
  * This was fixed by making the `concrete_args` input to `fx.symbolic_trace` available to the `MAV` and `MavTracer` objects
  * For BERT models, `concrete_args={'inputs_embeds':None}` gets around this issue
* Still, most NLP models use proxy variables for control flow, which is not supported by `torch.fx`
  * Perhaps fixing more arguments via `concrete_args` could work around this. To be investigated.

## Wav2Vec

In [1]:
import sys
sys.path.append('..')
from transformers import Wav2Vec2Model, Wav2Vec2Processor
import torch
from idlmav import MAV, plotly_renderer

model = Wav2Vec2Model.from_pretrained("facebook/wav2vec2-base")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base")
model.eval()
inputs = torch.randn(1, 16000)
device = 'cpu'

mav = MAV(model, inputs)
with plotly_renderer('notebook_connected'): mav.show_figure()


Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.



Tracing failed with torch.fx.symbolic_trace: zeros() received an invalid combination of arguments - got (tuple, device=Attribute, dtype=Attribute), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.dtype dtype = None, torch.layout layout = None, torch.device device = None, bool pin_memory = False, bool requires_grad = False)
 * (tuple of ints size, *, Tensor out = None, torch.dtype dtype = None, torch.layout layout = None, torch.device device = None, bool pin_memory = False, bool requires_grad = False)

Tracing with torch.compile


INFO:2025-02-22 12:15:28 28484:28484 init.cpp:181] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti


## Whisper tiny encoder

In [None]:
import sys
sys.path.append('..')
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch
from idlmav import MAV, plotly_renderer

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny")
processor = WhisperProcessor.from_pretrained("openai/whisper-tiny")
model.eval()
inputs = torch.randn((1,80,3000))
device = 'cpu'

mav = MAV(model.model.encoder, inputs)
with plotly_renderer('notebook_connected'): mav.show_figure()

Tracing failed with torch.fx.symbolic_trace: symbolically traced variables cannot be used as inputs to control flow
Tracing with torch.compile


## Audio Spectrogram Transformer (AST)
* TODO: Figure out required input size

In [1]:
import sys
sys.path.append('..')
from transformers import ASTConfig, ASTModel
import torch
from idlmav import MAV, plotly_renderer

config = ASTConfig()
model = ASTModel(config)
model.eval()
inputs = torch.randn(1, config.max_length, config.num_mel_bins)
device = 'cpu'

mav = MAV(model, inputs)
with plotly_renderer('notebook_connected'): mav.show_figure()

Tracing failed with torch.fx.symbolic_trace: symbolically traced variables cannot be used as inputs to control flow
Tracing with torch.compile


INFO:2025-02-22 12:41:14 38951:38951 init.cpp:181] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti


## Seamless M4T
* TODO: Run on better hardware

In [None]:
!wget -Otim.mp3 https://upload.wikimedia.org/wikipedia/commons/transcoded/1/16/Tim_Berners-Lee_-_Today_%28flac_-sample_s16_-f_ogg%29.oga/Tim_Berners-Lee_-_Today_%28flac_-sample_s16_-f_ogg%29.oga.mp3

--2025-02-19 02:06:21--  https://upload.wikimedia.org/wikipedia/commons/transcoded/1/16/Tim_Berners-Lee_-_Today_%28flac_-sample_s16_-f_ogg%29.oga/Tim_Berners-Lee_-_Today_%28flac_-sample_s16_-f_ogg%29.oga.mp3
Resolving upload.wikimedia.org (upload.wikimedia.org)... 185.15.58.240, 2a02:ec80:600:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|185.15.58.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256392 (1.2M) [audio/mpeg]
Saving to: ‘tim.mp3’


2025-02-19 02:06:23 (1.16 MB/s) - ‘tim.mp3’ saved [1256392/1256392]



In [None]:
import torchaudio
from transformers import AutoProcessor, SeamlessM4TModel
processor = AutoProcessor.from_pretrained("facebook/hf-seamless-m4t-medium", sampling_rate=8_000)
model = SeamlessM4TModel.from_pretrained("facebook/hf-seamless-m4t-medium", sampling_rate=8_000)

# Read an audio file and resample:
audio, orig_freq =  torchaudio.load("tim.mp3")
audio =  torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=8_000) # must be a 16 kHz waveform array
audio_inputs = processor(audios=audio, return_tensors="pt")


It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.


In [None]:
# Convert audio to text
output_tokens = model.generate(**audio_inputs, tgt_lang="eng", generate_speech=False)
translated_text_from_audio = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
translated_text_from_audio


"But when you think about the files on your computer, the emails, the files you put on the web, but there are data files on the web, like calendars and downloads, which you can't put on the web, because if you put the calendar on the web, you have to do everything you need to do with the data you need to compare the other calendars at the same time."

In [None]:
import sys
sys.path.append('..')
from idlmav import MAV, plotly_renderer

mav = MAV(model.speech_encoder, audio_inputs, device='cpu')
with plotly_renderer('notebook_connected'): mav.show_figure()

Tracing failed with torch.fx.symbolic_trace: symbolically traced variables cannot be used as inputs to control flow
Tracing with torch.compile


INFO:2025-02-19 02:24:18 174222:174222 init.cpp:181] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti


: 

## Others under consideration
* [melodyflow-t24-30secs](https://huggingface.co/facebook/melodyflow-t24-30secs)
* [MagNET](https://huggingface.co/facebook/magnet-small-30secs)