# Exploring RAVE in Python



## Global architecture of RAVE

Remembering our RAVE global architecture

![RAVE](assets/rave_simple.png)

we have different parts : 
- an *encoder*, that encodes a given audio signal into a time series of *latent* representation
- a *decoder*, that decodes a time series of latent representations into an audio signal

in addition to this couple of modules (called an *auto-encoder* for the reminder), RAVE also has a *discriminator*, that is used during training to refine the auto-encoder.

How are called these modules in the RAVE object? Quite simply, we're not that mischievous. 

In [None]:
import torchbend; print(torchbend.__file__)
from dandb import download_models, import_model

models = download_models()
print("downloaded models :", models)

# just take the first model
current_model = models["sol_full_nopqmf"]
model = import_model(current_model)

encoder = model.encoder
decoder = model.decoder 
discriminator = model.discriminator

print('encoder type : ', encoder)
print('decoder type : ', decoder)
print('discriminator type : ', discriminator)

Here these three modules are instances of our `BendingModule` bending wrapper, but you can recover the actual original modules by accessing the `module` attribute. 

In [None]:
# /!\ executing that will output a huge load of text with all the encoder architecture
# maybe not a good idea for manic sensibilities
# still want it? uncomment the line below

# print(model.encoder.module)

Ok, let's now reconstruct a given sound. The most straightforward way is to use the `forward` function, that in our case encodes and decodes an incoming signal. 

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys; sys.path.append('../torchbend')
import torchaudio
from dandb import get_sounds, plot_audio

# load default sounds
sounds = get_sounds()
x = sounds.load('violin.wav', sr=model.sample_rate)

# you can also load sounds from a target folder with a custom path : 
# sounds = get_sounds(...yourpath...)
# print(sounds)
# x, sr = sounds.load(...yourfiles...)

print('total shape of input tensor (n_examples x n_channels x n_samples):')
print(x.shape)

out = model.forward(x)
for i in range(len(out)):
    plot_audio(x[i], name="original sample %d"%i)
    plot_audio(out[i], name="reconstruction %d"%i)


Let's now extract the latent trajectories of a RAVE model. These can be obtained for a set of sounds (here the tensor `x`) using the function `encode` of the model, as below.

***Tip***: you can display a single curve by double-clicking a curve on the legend to the right.

In [None]:
from dandb import plot_latent_trajs

z = model.encode(x)
plot_latent_trajs(z)

We visualize here the **full** latent space, with all the 128 dimensions : we can see that most of the latent positions lie between -3 and 3, as the latent space is regularized to resemble an isotropic Gaussian ($\mu = 0, \sigma=1)$: 

![normal distribution](assets/normal.png)

We can also see that the latent trajectories are about 621 steps, that corresponds roughly to 1273970 / 2048 (the first value being the length of the sample, and the second the downsampling rate of a casual RAVE). We can invert this full representation by *decoding* this latent series back into audio domain with the `decode` callback

In [None]:
from dandb import plot_audio
reconstructions = model.decode(z)
for rec in reconstructions: 
    plot_audio(rec, display=True)

## Compressed vs full latent space

When interacting with RAVE through the VST or the nn~ external, recmind that you are not directly enteracting with this 128-dimensional latent space but with a reduced latent, that has a number a dimensions that is specified when exporting the raw RAVE model as a compressed `.ts` file. This intermediary representation is obtained with [Principal Component Analysis](https://en.wikipedia.org/wiki/Principal_component_analysis), that has the nice property that it allows you to recover the amount of reduced dimensions needed to "reconstruct" the original space with a given fidelity (given in %). If the PCA has the same amount of dimensions than the original space (here 128), the **reconstruction is lossless**, such that full recover of the latent space is possible.

The following code show the fidelity curve (that is, the number of dimensions needed to reconstruct a given percentage of the original dataset) of the loaded model. If you want to obtain the reduced latent space using the model, you can you the `processed` keyword (only available through RAVE's `torchbend` interface).

In [None]:
import numpy as np 
import plotly.express as px 

target_fidelity = np.linspace(0., 1., 100, endpoint=False)
n_dims_for_fidelity = map(model.get_dims_for_fidelity, target_fidelity)
fig = px.line({"% fidelity": target_fidelity,'latent dims': n_dims_for_fidelity}, x="% fidelity", y="latent dims", title="number of dimensions for target fidelity %")
fig.show()

# the latent variables will be projected onto the PCA space when postprocess = True
z = model.encode(x, postprocess=True)
x = model.decode(z, preprocess=True)

# the PCA is not cropped yet, such that the obtained space is lossless.
# let's recover the latent trajectories needed to recover 80% of the fidelity : 
target_fidelity = 0.8
n_dims = model.get_dims_for_fidelity(target_fidelity)

# and crop the latent representation to the obtained number of dimensions
z_reduced = z[..., :n_dims, :]
plot_latent_trajs(z_reduced)

Using python code to interact with latent space allow modifications that are not possible with a compressed latent space, but... things are little chaotic. Yet, you can have some fun by trying things out :

In [None]:
import torch

sounds = get_sounds()
x = sounds.load('violin.wav', 'piano.wav', sr=model.sample_rate)
z = model.encode(x)

# remember : latent shapes are n_examples x latent_channels x latent_steps
print(z.shape)

# Zeroing odd dimensions
z1 = z.clone()
z1[..., 1::2, :] = 0

# Progressively noising even dimensions
z2 = z.clone()
z2[..., ::2, :] = torch.linspace(0, 1, z.shape[-1]) * torch.randn_like(z1[..., ::2, :])

# Morphing two latent trajectories through time
z3 = (z[0] * torch.linspace(0, 1, z.shape[-1]) + z[1] * torch.linspace(1, 0, z.shape[-1]))[None]

# Interleaving two latent trajectories 
z4 = torch.stack([z[0, i, :] if i % 2 == 0 else z[1, i, :] for i in range(z.shape[1])], 0)[None]

# 128-d spherical latent trajectory
phase = torch.linspace(0, 1, z1.shape[-1])
z5 = torch.cos(2 * torch.pi * (phase[None] + torch.linspace(0, 1, 128)[:, None]))[None]

names = ['original', 'zeros', 'progressive noise', "morphing", "interleaving", "spherical"]
for i, example in enumerate([z, z1, z2, z3, z4, z5]):
    out = model.decode(example)
    for j in range(len(out)):
        print(f"{names[i]}_{j}")
        plot_audio(out[j], name=f"{names[i]}_{j}", display=True)

Ok, you started messing around with latent trajectories through Python. Of course, using a programming language removes the cumbersomeness of dealing with 128 dimensions, hence allowing to perform operations directly in the full latent space of RAVE. In the next textbook, we will learn how to analyze the graph of a module, and dissect both encoder and decoders.