# ***Step 1***: Import Libraries and Set Up Environment
In this step, we will import the necessary libraries and set up the environment for our audio processing tasks. These libraries include librosa for audio loading and manipulation, soundfile for reading and writing audio files, and torch for leveraging machine learning capabilities. We also disable gradient computation in PyTorch as it is not required for our use case.

 script written by Iran R. Roman (iran [@] ccrma.stanford.edu)
iranroman.github.io

In [None]:
# imports
from IPython.display import Audio, display
import librosa as li
import soundfile as sf
import numpy as np
import requests
import torch
torch.set_grad_enabled(False)


<torch.autograd.grad_mode.set_grad_enabled at 0x7a5d90513fd0>

# Step 2: Download and Load an Audio File
In this step, we will download an audio file from a specified URL and save it locally. We will then load the audio file using the librosa library and play it to ensure it was downloaded and loaded correctly.

In [None]:
# let's download an audio file

def download_file(url, file_path):
    """
    Download file from a given URL and save it to the specified file path.
    """
    response = requests.get(url)
    response.raise_for_status()  # This will raise an exception if there is an error

    with open(file_path, 'wb') as file:
        file.write(response.content)

# download an audio file
url = 'https://ccrma.stanford.edu/~jos/wav/gtr-nylon22.wav'
file_path = 'audio.wav'
download_file(url, file_path)

x, sr = li.load('audio.wav',sr=44100)

display(Audio(x, rate=sr))

# Step 3: Download RAVE Model Parameters and Build the Model
In this step, we will download the RAVE model parameters/weights from a specified URL and save them locally. We will then load the model using PyTorch and set it to evaluation mode. This model can be used for audio processing tasks such as sound generation and transformation.

In [None]:
# download rave parameters/weights and build the model
url = 'https://play.forum.ircam.fr/rave-vst-api/get_model/percussion'
# url = 'https://play.forum.ircam.fr/rave-vst-api/get_model/vintage'
# url = 'https://play.forum.ircam.fr/rave-vst-api/get_model/nasa'
# url = 'https://play.forum.ircam.fr/rave-vst-api/get_model/darbouka_onnx'
# url = 'https://play.forum.ircam.fr/rave-vst-api/get_model/VCTK'
# you can learn more about each model at:
# https://acids-ircam.github.io/rave_models_download
file_path = 'model.ts'
download_file(url, file_path)

model = torch.jit.load("model.ts").eval()

# Step 4: Encode and Decode Audio with RAVE Model
In this step, we will use the RAVE model to encode and decode an audio signal. This process involves transforming the audio into a latent representation (encoding) and then reconstructing the audio from this representation (decoding). We will then save and play the reconstructed audio to compare it with the original.

In [None]:
x, sr = li.load('audio.wav',sr=44100)
x = torch.from_numpy(x).reshape(1,1,-1)

# encode and decode the audio with RAVE
z = model.encode(x)
x_hat = model.decode(z).numpy().reshape(-1)

sf.write("model_output.wav", x_hat, sr)
display(Audio(x_hat, rate=sr))

# Step 5: Generate Audio from Random Numbers
In this step, we will generate an audio signal by decoding random latent vectors using the RAVE model. This process involves creating random numbers, shaping them into the appropriate format, and then using the model to decode these numbers into an audio waveform. We will then save and play the generated audio.

In [None]:
# generate audio from random numbers
z = torch.from_numpy(np.random.randn(*z.shape).astype(np.float16))
x_hat = model.decode(z).numpy().reshape(-1)

sf.write("model_output.wav", x_hat, sr)
display(Audio(x_hat, rate=sr))