# Bending RAVE for weird audio generations

Ok, we saw the potentialities of bending with the simple MNIST example ; however, what about RAVE? We had a little flavour in the third notebook of the inner working of RAVE: now, let's use this knowledge to... well, we'll find out! 

***Note***: we will focus on the decoder, as this is the main sound generator. You can experiment by bending the encoder, but of course bending the encoder would not be able to produce sounds that could not be generated before. Yet, it can be interesting to manipulate the way a incoming sound is encoded to the latent space and transformed, so it's still worth!

In [None]:
import torchbend as tb; tb.set_output('notebook')
from dandb import download_models, import_model
models = download_models()
current_model = models["sol_full_nopqmf"]
model = import_model(current_model)

model.print_weights(flt=r"decoder");

We notice that the convolutional kernels may be selected with the pattern "decoder.net.{layer}.aligned.branches.0.net.\d.weight_v", where {layer} is a specific layer (except the upsampling ones) and \d is a special character that will match every digit. Upsampling convs are accessible through the "decoder.net.{layer}.weight_v". So, let's try to apply our bending operations to these layers:

In [None]:
from dandb import get_sounds, plot_audio

x = get_sounds().load("violin.wav")
print('original : ')
plot_audio(x[0], display=True)
z = model.encode(x)
out = model.decode(z)
print('reconstruction : ')
plot_audio(out[0], display=True)


In [None]:

layers = [r"decoder.net.%d.aligned.branches.0.net.\d.weight_v"%layer for layer in [3, 8, 12, 18]]
# uncomment the line below to bend the upsampling layer instead
# layers = [r"decoder.net.%d.weight_v"%layer for layer in [2, 6, 11, 16]]

# /!\ CALLBCACKS ZONE_______________________________________________
# do not hesitate to try out different callbacks and parameters here! 

# MASKING
# it's marvelous how hard you can discard the weights to have similar sounds. (above 30%, difference is barely noticeable)
# these networks should be compressible...
# cb = tb.Mask(0.1)

# SCALING
# as the network is "almost" linear, positive scaling often yield overdriven gain
# weights are also normalized by the network, so effects are limited
# with negative scaling... try that yourself 🤓
cb = tb.Scale(-1.0)

# BIASING
# contrary to scaling, biasing is VERY sensible... proceeed with caution! 
# cb = tb.Bias(0.05)

# NOISING
# same here, noising in additive noise is very sensible (as it comes back to adding stuff)
# cb = tb.Normal(std=1., op = "add")
# cb = tb.Normal(std=1., op = "mul")

for bended_layer in layers:
    model.reset()
    model.bend(cb, bended_layer, bend_graph=False, verbose=True)
    out = model.decode(z)
    print("bending %s with %s"%(bended_layer, cb))
    plot_audio(out[0], display=True)

## Bending RAVE activations

In [None]:
model.print_activations("decode", op="call_module", flt="decoder.*")

In [None]:
activations = ['decoder_net_%d_aligned_branches_1'%i for i in [4, 8, 13, 18]]


# /!\ CALLBCACKS ZONE_______________________________________________
# do not hesitate to try out different callbacks and parameters here! 

# MASKING
# it's marvelous how hard you can discard the weights to have similar sounds. (above 30%, difference is barely noticeable)
# these networks should be compressible...
cb = tb.Mask(0.1)

# SCALING
# as the network is "almost" linear, positive scaling often yield overdriven gain
# weights are also normalized by the network, so effects are limited
# with negative scaling... try that yourself 🤓
# cb = tb.Scale(-1.0)

# BIASING
# contrary to scaling, biasing is VERY sensible... proceeed with caution! 
# cb = tb.Bias(0.05)

# NOISING
# same here, noising in additive noise is very sensible (as it comes back to adding stuff)
# cb = tb.Normal(std=1., op = "add")
cb = tb.Normal(std=1., op = "mul")

for bended_act in layers:
    model.reset()
    model.bend(cb, bended_act, bend_param=False, verbose=True)
    out = model.decode(z)
    print("bending %s with %s"%(bended_layer, cb))
    plot_audio(out[0], display=True)