# audiocraft-mps debugging notebook

### locally executed environment
- macOS(M1 Pro): Ventura 13.5.1
- Python: 3.11.4

### Setup

To install audiocraft, please check the [official documentation](https://github.com/facebookresearch/audiocraft).

note: Updateing masOS to >=13.0 is required to run LSTM models in mps.

Run these commands before implementing this notebook.
```sh-session
$conda env config vars set PYTORCH_ENABLE_MPS_FALLBACK=1
$conda activate <AUDIOCRAFT_VIRTUAL_ENV>
```

In [1]:
from audiocraft.models import AudioGen
from audiocraft.utils.notebook import display_audio

cpu_model = AudioGen.get_pretrained('facebook/audiogen-medium', device='cpu')
mps_model = AudioGen.get_pretrained('facebook/audiogen-medium', device='mps')

objc[43083]: Class AVFFrameReceiver is implemented in both /Users/ebarakoji/miniforge3/envs/audiogen/lib/libavdevice.58.8.100.dylib (0x1089a0798) and /Users/ebarakoji/miniforge3/envs/audiogen/lib/python3.11/site-packages/av/.dylibs/libavdevice.59.7.100.dylib (0x168cf8778). One of the two will be used. Which one is undefined.
objc[43083]: Class AVFAudioReceiver is implemented in both /Users/ebarakoji/miniforge3/envs/audiogen/lib/libavdevice.58.8.100.dylib (0x1089a07e8) and /Users/ebarakoji/miniforge3/envs/audiogen/lib/python3.11/site-packages/av/.dylibs/libavdevice.59.7.100.dylib (0x168cf87c8). One of the two will be used. Which one is undefined.
  tensor.erfinv_()


In [2]:
import os
import torchaudio

prompt_path = os.path.join(os.path.abspath(os.pardir), 'examples', 'outputs', 'crow.wav')
prompt, sample_rate = torchaudio.load(prompt_path)

In [3]:
# Setting use_sampling to False to see whether gen_tokens are the same, but this causes deteriorated sounds.
gen_params_dict = {
    'use_sampling': False,
    'top_k': 250,
}

cpu_model.set_generation_params(**gen_params_dict)
mps_model.set_generation_params(**gen_params_dict)


gen_args = {
    'prompt': prompt,
    'prompt_sample_rate': sample_rate,
    'descriptions': ['A crow is cawing'],
    'progress': True,
    'debug_tokens': True,
}   

In [4]:
# changed the generate function to return the tuple of (output, prompt_tokens, gen_tokens) to inspect decoder
cpu_output, cpu_prompt_tokens, cpu_gen_tokens = cpu_model.generate_continuation(**gen_args)

   253 /    500

In [5]:
mps_output, mps_prompt_tokens, mps_gen_tokens = mps_model.generate_continuation(**gen_args)

   253 /    500

In [6]:
(cpu_output == mps_output.to('cpu')).all()

tensor(False)

In [7]:
(cpu_gen_tokens == mps_gen_tokens.to('cpu')).all()

tensor(False)

In [8]:
(cpu_prompt_tokens == mps_prompt_tokens.to('cpu')).all()

tensor(False)

In [19]:
prompt.dim()

2

In [20]:
cpu_x, cpu_scale = cpu_model.compression_model.preprocess(prompt[None])
mps_x, mps_scale = mps_model.compression_model.preprocess(prompt[None].to('mps'))

print((cpu_x == mps_x.to('mps').to('cpu')).all())
print(cpu_scale == mps_scale)

tensor(True)
True


In [22]:
cpu_emb = cpu_model.compression_model.encoder(cpu_x)
mps_emb = mps_model.compression_model.encoder(mps_x).to('cpu')
(cpu_emb == mps_emb.to('cpu')).all()

tensor(False)

In [23]:
cpu_codes = cpu_model.compression_model.quantizer.encode(cpu_emb)
mps_codes = mps_model.compression_model.quantizer.encode(cpu_emb.to('mps')).to('cpu')

(cpu_codes == mps_codes.to('cpu')).all()

tensor(True)