# Complex and Quaternion Neural Networks with SpeechBrain

This tutorial demonstrates how to use the SpeechBrain implementation of complex-valued and quaternion-valued neural networks for speech technologies. It covers the basics of highdimensional representations and the associated neural layers : Linear, Convolution, Recurrent and Normalisation. 

## Prerequisites
- [SpeechBrain Introduction](https://colab.research.google.com/drive/12bg3aUdr9mTfOGqcB5pSMABoIKPgiwcM?usp=sharing)
- [YAML tutorial](https://colab.research.google.com/drive/1Pg9by4b6-8QD2iC0U7Ic3Vxq4GEwEdDz?usp=sharing)
- [Brain Class tutorial](https://colab.research.google.com/drive/1fdqTk4CTXNcrcSVFvaOKzRfLmj4fJfwa?usp=sharing)
- [Speech Features tutorial](https://colab.research.google.com/drive/1CI72Xyay80mmmagfLaIIeRoDgswWHT_g?usp=sharing)

## Introduction and background

Complex-numbers are an extension or real numbers to the two dimensional space, they are composed of two parts : real and imaginary. A real number `z` is often expressed as:`z = r + ix`. Complex numbers are used in a wide variety of real-world applications as they provide an adequate algebra to manipulate concepts in the two dimensional space (i.e. rotations, translations, phase ...). As a matter of facts, complex-numbers offer a natural representation to the speech signal. For instance, the well-known Fourier transform relies on a weighted sum of complex sinusoids with increasing frequency, hence its output is defined in the complex space (i.e. amplitude and phase). Sometimes, the phase information is discarded to allow further processing in the real-valued space. However, many applications including speaker related ones could benefit from the entire complex-valued representation. 

Quaternion-numbers, on the other hands, are a generalisation of complex-numbers to the three dimensional space. They also contain a real (`r`) and an imaginary part but the latter element is in fact a 3D vector (`ix + jy + kz`). A quaternion `q` can be expressed as: `q = r + ix + jy + kz`. In practice, a quaternion defines a 3D rotation. Quaternions are extremely useful in physics, computer science, computer graphics or in the robotic domain such as for kinematic. Indeed, they provide a stable, natural and smooth way of conceiving and interpreting movements in our 3D space. 

### How do they connect to neural networks?

As soon as the modern deep learning resurgence started, researchers tried to replace the traditional real algebra with complex and quaternion numbers to fit specific tasks. As an example, complex-valued neural networks (CVNN) could be used to directly deal with the FFT output while quaternion neural networks (QNN) could be implemented to generated realistic robot movements. 

Besides the natural fit to some representations that QNN and CVNN offer, they also share an interesting property: **weight sharing**. Indeed, quaternion and complex numbers follow a specific algebra, with well-defined rules that must be translated into the developped Q-CVNN. In particular, one does not multiply to quaternions or two complex-numbers in the same manner as two real numbers. Hence, all the dot product existing in real-valued numbers are replaced with the corresponding complex and Hamilton products finally inducing a mechanism of **weight sharing**. The latter mechanism has been demonstrated to be extremely usefull to learn expressive representation of multidimensional inputs while preserving the internal relation that exist within the components of the signal (e.g amplitude and phase for complex numbers). 

In this tutorial, we won't go into the details of all these properties as it would be way too long. Instead, we propose to detail how to use such CVNN and QNN with SpeechBrain.

### Relevant bibliography
- *Andreescu, T., & Andrica, D. (2006). Complex Numbers from A to... Z (Vol. 165). Boston: Birkhäuser.*
- *Altmann, S. L. (1989). Hamilton, Rodrigues, and the quaternion scandal. Mathematics Magazine, 62(5), 291-308.*
- **Complex Neural Networks Survey:** *Hirose, A. (2012). Complex-valued neural networks (Vol. 400). Springer Science & Business Media.*
- **All about Quaternion Neural Networks:** *Parcollet, T., (2019) Quaternion Neural Networks, PhD Thesis, Avignon Université* 

# SpeechBrain representation of Complex and Quaternions

Within SpeechBrain algebra operations are abstracted in the neural layers. This allows our users to not focus on the initial representation. In practice, it means that one does not have to declare a specific type of Tensor to use quaternion or complex numbers. More precisely, we will always manipulate real-valued Tensors. Indeed, all the operations corresponding to the different algebras can be expressed in a Tensor / matrix format, enabling an easy integration with modern GPU. 

Let's get practical: Any PyTorch Tensor generated in your recipe can be viewed as a complex or quaternion-valued Tensor. Indeed, it depends on the layer that processes it. If it's a `torch.nn.Linear`, then it will be real. If it's a `nnet.complex_networks.c_linear.CLinear` then it will be complex! 

**Wait, how do you interpret and build my tensors then?**

Simple, let's say we want to consider a Tensor made of `3` complex numbers or `3` quaternions. The different parts of the numbers will simple be concatenated in the following way:

`c_tensor = [r,r,r,x,x,x] and q_tensor = [r,r,r,x,x,x,y,y,y,z,z,z]`

This is the reason why any Tensor you declare can be seen as a complex or a quaternion Tensor for a {C/Q}-Layer in SpeechBrain, as long as the features dimension can be divided by 2 for complex and 4 for quaternion numbers. 

Now, we need to install SpeechBrain to better illustrate this.

In [None]:
%%capture
# Install from PyPI
!pip install speechbrain

Alternatively, you can also clone the repository to have access to all the recipes:

In [None]:
%%capture
!git clone https://github.com/speechbrain/speechbrain

Now, let's try to manipulate some Tensor to better understand the formalism. We start by instantiating a Tensor containing 8 real numbers.

In [None]:
import torch

T = torch.rand((1,8))
print(T)

tensor([[0.3591, 0.6284, 0.9070, 0.7837, 0.5425, 0.0614, 0.3884, 0.8479]])


Then, we access the SpeechBrain libary for manipulating complex numbers and we simply display the different parts (real, imaginary).

In [None]:
from speechbrain.nnet.complex_networks.c_ops import get_real, get_imag

print(get_real(T))
print(get_imag(T))

tensor([[0.3591, 0.6284, 0.9070, 0.7837]])
tensor([[0.5425, 0.0614, 0.3884, 0.8479]])


As you can see, the initial Tensor is simply splitted in 2 and the same happens with 4 and quaternions.

# Complex and quaternion products

At the core of QNN and CVNN is the product. Of course, others specificities exist such as the weight initialisation, specific normalisations, activation functions etc. Nevertheless, the basic product is central to all neural network layers : a weight matrix that multplies the input vector. 

A very good thing to know is that a complex number can be represented in a real-valued matrix format:

\begin{equation}
\left(\begin{array}{rr}
a & -b \\
b & a
\end{array}\right).
\end{equation}

The same goes for a quaternion number:

\begin{equation}
\left(\begin{array}{cccc}
a & -b & -c & -d \\
b & a & -d & c \\
c & d & a & -b \\
d & -c & b & a
\end{array}\right).
\end{equation}

And even more interestingly, if we multiply two of these matrices, then we obtain the product corresponding to the considered algebra. For instance, the complex product between two complex number is defined as:

\begin{equation}
\left(\begin{array}{rr}
a & -b \\
b & a
\end{array}\right)\left(\begin{array}{lr}
c & -d \\
d & c
\end{array}\right)=\left(\begin{array}{cc}
a c-b d & -a d-b c \\
b c+a d & -b d+a c
\end{array}\right),
\end{equation}

which is equivalent to the formal definition:

\begin{equation}
(a+\mathrm{i} b)(c+\mathrm{i} d)=(a c-b d)+\mathrm{i}(a d+b c).
\end{equation}

**Ok, so how is this implemented in SpeechBrain**?

Every single layer that you can call either on the complex or quaternion libraries will follow two steps:
1. *init()*: Define the complex / quaternion weights as torch.Parameters and initialise them with the adapted scheme.
2. *forward()*: Call the corresponding operation that implements the specific product. For instance, a complex linear layer would call the `complex_linear_op()` from `speechbrain.nnet.complex_networks.c_ops`.

In practice, the `speechbrain.nnet.complex_networks.c_ops.complex_linear_op` function simply:
1. Takes the weights of the layer and builds the corresponding real-valued matrix.
2. Apply a product between the input and this matrix to simulate the complex / quaternion products.

Example:


In [None]:
def complex_linear_op(input, real_weight, imag_weight, bias):
    """
    Applies a complex linear transformation to the incoming data.

    Arguments
    ---------
    input : torch.Tensor
        Complex input tensor to be transformed.
    real_weight : torch.Parameter
        Real part of the quaternion weight matrix of this layer.
    imag_weight : torch.Parameter
        First imaginary part of the quaternion weight matrix of this layer.
    bias : torch.Parameter
    """

    # Here we build the real-valued matrix as defined by the equations!
    cat_real = torch.cat([real_weight, -imag_weight], dim=0)
    cat_imag = torch.cat([imag_weight, real_weight], dim=0)
    cat_complex = torch.cat([cat_real, cat_imag], dim=1)

    # If the input is already [batch*time, N]

    # We do inputxconstructed_matrix to simulate the product

    if input.dim() == 2:
        if bias.requires_grad:
            return torch.addmm(bias, input, cat_complex)
        else:
            return torch.mm(input, cat_complex)
    else:
        output = torch.matmul(input, cat_complex)
        if bias.requires_grad:
            return output + bias
        else:
            return output

# We create a single complex number
complex_input = torch.rand(1, 2)

# We create two Tensors (not parameters here because we don't care about storing gradients)
# These tensors are the real_parts and imaginary_parts of the weight matrix.
# The real part is equivalent [nb_complex_numbers_in // 2, nb_complex_numbers_out // 2]
# The imag part is equivalent [nb_complex_numbers_in // 2, nb_complex_numbers_out // 2]
# Hence if we define a layer with 1 complex input and 2 complex outputs:
r_weight = torch.rand((1,2))
i_weight = torch.rand((1,2))

bias = torch.ones(4) # because we have 2 (complex) x times 2 = 4 real-values

# and we forward propagate!
print(complex_linear_op(complex_input, r_weight, i_weight, bias).shape)

torch.Size([1, 4])


**It is important to note that the quaternion implementation follows exactly the same approach.**

# Complex-valued Neural Networks

Once you are familiar with the formalism, you can easily derive any complex-valued neural building blocks given in `speechbrain.nnet.complex_networks`:
- 1D and 2D convolutions.
- Batch and layer normalisations.
- Linear layers.
- Recurrent cells (LSTM, LiGRU, RNN).

*According to the litterature, most of the complex and quaternion neural networks rely on split activation functions (any real-valued activation function applied over the complex/quaternion valued signal). For now, SpeechBrain follows this approach and does not offer any fully complex or quaternion activation function*.

## Convolution layers

First, let's define a batch of inputs (that could be the output of the FFT for example). 


In [None]:
from speechbrain.nnet.complex_networks.c_CNN import CConv1d, CConv2d

# [batch, time, features]
T = torch.rand((8, 10, 32))

# We define our layer and we want 12 complex numbers as output.
cnn_1d = CConv1d( input_shape=T.shape, out_channels=12, kernel_size=3)

out_tensor = cnn_1d(T)
print(out_tensor.shape)

torch.Size([8, 10, 24])


As we can see, we applied a Complex-Valued 1D convolution over the input Tensor and we obtained an output Tensor whose features dimension is equal to 24. Indeed, we requested 12 `out_channels` which is equivalent to 24 real-values. Remember : **we always work with real numbers, the algebra is abstracted in the layer itself!**

The same can be done with 2D convolution.


In [None]:
# [batch, time, fea, Channel]
T = torch.rand([10, 16, 30, 30])

cnn_2d = CConv2d( input_shape=T.shape, out_channels=12, kernel_size=3)

out_tensor = cnn_2d(T)
print(out_tensor.shape)

torch.Size([10, 16, 30, 24])


Please note that the 2D convolution is applied over the time and fea axis. The channel axis is used to be considered as the real and imaginary parts: `[10, 16, 30, 0:15] = real` and `[10, 16, 30, 15:30] = imag`.

## Linear layer

In the same manner as for convolution layers, we just need to instantiate the right module and use it!

In [None]:
from speechbrain.nnet.complex_networks.c_linear import CLinear

# [batch, time, features]
T = torch.rand((8, 10, 32))

# We define our layer and we want 12 complex numbers as output.
lin = CLinear(12, input_shape=T.shape, init_criterion='glorot', weight_init='complex')

out_tensor = lin(T)
print(out_tensor.shape)

torch.Size([8, 10, 24])


Please notice that we added the `init_criterion` and `weight_init` arguments. These two parameters that exist in **ALL** the complex and quaternion layers define how the weights are initialised. Indeed, complex and quaternion-valued weights need a carefull initialisation process as detailled in *Deep Complex Networks* by Chiheb Trabelsy et al. and `Quaternion Recurrent Neural Networks` from Titouan Parcollet et al.

## Normalisation layers

One do not normalise a set of complex numbers (e.g the output of a complex-valued layers) in the same manner as a set of real-valued numbers. Due to the complexity of the task, this tutorial won't go into the details. Please note that the code is fully available in the corresponding SpeechBrain library and that it strictly follows the description first made in the paper *Deep Complex Networks* by Chiheb Trabelsy et al. 

SpeechBrain supports both complex batch and layer normalisations:


In [None]:
from speechbrain.nnet.complex_networks.c_normalization import CBatchNorm,CLayerNorm

inp_tensor = torch.rand([10, 16, 30])

# Not that by default the complex axis is the last one, but it can be specified.
CBN = CBatchNorm(input_shape=inp_tensor.shape)
CLN = CLayerNorm(input_shape=inp_tensor.shape)

out_bn_tensor = CBN(inp_tensor)
out_ln_tensor = CLN(inp_tensor)

## Recurrent Neural Networks

Recurrent neural cells are nothing more than multiple linear layers with a time connection. Hence, SpeechBrain provides an implementation for the complex variation of LSTM, RNN and LiGRU. As a matter of fact, these models are strictly equivalent to the real-valued ones, except that Linear layers are replaced with CLinear layers!

In [None]:
from speechbrain.nnet.complex_networks.c_RNN import CLiGRU, CLSTM, CRNN

inp_tensor = torch.rand([10, 16, 40])

lstm = CLSTM(hidden_size=12, input_shape=inp_tensor.shape, weight_init='complex', bidirectional=True)
rnn = CRNN(hidden_size=12, input_shape=inp_tensor.shape, weight_init='complex', bidirectional=True)
ligru = CLiGRU(hidden_size=12, input_shape=inp_tensor.shape, weight_init='complex', bidirectional=True)

print(lstm(inp_tensor).shape)
print(rnn(inp_tensor).shape)
print(ligru(inp_tensor).shape)

torch.Size([10, 16, 48])
torch.Size([10, 16, 48])
torch.Size([10, 16, 48])


Note that the output dimension is 48 as we have 12 complex numbers (24 values) times 2 directions (bidirectional RNNs).

# Quaternion Neural Networks

Luckily, QNN within SpeechBrain follow exactly the same formalism. Therefore, you can easily derive any quaternion-valued neural networks from the building blocks given in `speechbrain.nnet.quaternion_networks`:
- 1D and 2D convolutions.
- Batch and layer normalisations.
- Linear and Spinor layers.
- Recurrent cells (LSTM, LiGRU, RNN).

*According to the litterature, most of the complex and quaternion neural networks rely on split activation functions (any real-valued activation function applied over the complex/quaternion valued signal). For now, SpeechBrain follows this approach and does not offer any fully complex or quaternion activation function*.

Everything we just saw with complex neural networks still hold. Hence we can summarize everything in a single code snippet:

In [None]:
from speechbrain.nnet.quaternion_networks.q_CNN import QConv1d, QConv2d
from speechbrain.nnet.quaternion_networks.q_linear import QLinear
from speechbrain.nnet.quaternion_networks.q_RNN import QLiGRU, QLSTM, QRNN

# [batch, time, features]
T = torch.rand((8, 10, 40))

# [batch, time, fea, Channel]
T_4d = torch.rand([10, 16, 30, 40])

# We define our layers and we want 12 quaternion numbers as output (12x4 = 48 output real-values).
cnn_1d = QConv1d( input_shape=T.shape, out_channels=12, kernel_size=3)
cnn_2d = QConv2d( input_shape=T_4d.shape, out_channels=12, kernel_size=3)

lin = QLinear(12, input_shape=T.shape, init_criterion='glorot', weight_init='quaternion')

lstm = QLSTM(hidden_size=12, input_shape=T.shape, weight_init='quaternion', bidirectional=True)
rnn = QRNN(hidden_size=12, input_shape=T.shape, weight_init='quaternion', bidirectional=True)
ligru = QLiGRU(hidden_size=12, input_shape=T.shape, weight_init='quaternion', bidirectional=True)

print(cnn_1d(T).shape)
print(cnn_2d(T_4d).shape)
print(lin(T).shape)
print(lstm(T)[0].shape) # RNNs return output + hidden so we need to filter !
print(ligru(T)[0].shape) # RNNs return output + hidden so we need to filter !
print(rnn(T)[0].shape) # RNNs return output + hidden so we need to filter !


torch.Size([8, 10, 48])
torch.Size([10, 16, 30, 48])
torch.Size([8, 10, 48])
torch.Size([8, 10, 96])
torch.Size([8, 10, 96])
torch.Size([8, 10, 96])


## Quaternion Spinor Neural Networks

Spinor neural networks are a special kind of quaternion-valued NNs. As stated above, quaternions have been invented to represent rotations. In the QNN layers defined above, the basic operation is `inputs x weights` with inputs and weights two set of quaternions and `x` the Hamilton product. 

In fact, multiplying two quaternions is equivalent to creating a new rotation that is a composition of the first rotation followed by the second one. For example: `q3 = q1 x q2` is equal to: *q3 is a rotation that is equivalent to a rotation by q1 followed by a rotation from q2*. **In this context, we aren't rotating objects, but we are composing new rotations**. Let's say you want to predict the next movement that a robot will do. In this particular case, you can use this concept to give your NN a quaternion as input (i.e. the previous movement) to produce a new quaternion as output (i.e next movement). 

Spinor Neural Networks (SNN) have been proposed to specifically model rotations. If we take the same robot example, our input would be the 3D coordinate (x,y,z) of the robot arm before the movement, while the output of the SNN would be its predicted coordinates after moving!

To do so, we need to replace the standard product performed in all layers with:

\begin{equation}
\vec{v_{output}}=q_{weight} \vec{v_{input}} q^{-1}_{weight}.
\end{equation}

This equation is the formal definition of the rotation of a vector $\vec{v}$ by a unit quaternion $q_{weight}$ (whose norm is equal to 1) with $q^{-1}$ its conjugate. Both left and right products are Hamilton product.

**Ok, so how is this implemented in SpeechBrain?**

In the exact same manner than for the standard Hamilton product! Indeed, such rotation can also be represented as a matrix product:

\begin{equation}
\left(\begin{array}{ccc}
a^{2}+b^{2}-c^{2}-d^{2} & 2 b c-2 a d & 2 a c+2 b d \\
2 a d+2 b c & a^{2}-b^{2}+c^{2}-d^{2} & 2 c d-2 a b \\
2 b d-2 a c & 2 a b+2 c d & a^{2}-b^{2}-c^{2}+d^{2}
\end{array}\right).
\end{equation}

Hence, we just need to define the `quaternion_op` that follows the same usual process: 
1. Compose a real-valued matrix from the different weight components
2. Apply a matrix product between the input and this rotation matrix!

[Check the code!](http://www.darnault-parcollet.fr/Parcollet/hiddennoshare/speechbrain.github.io/documentation/speechbrain.nnet.quaternion_networks.q_ops.html#speechbrain.nnet.quaternion_networks.q_ops.quaternion_linear_rotation_op)

## Turning a quaternion layer into a spinor layer

Spinor layer can be activated with a boolean parameter in all quaternion layers.
Here are a couple of examples:



In [None]:
from speechbrain.nnet.quaternion_networks.q_CNN import QConv1d
from speechbrain.nnet.quaternion_networks.q_linear import QLinear

# [batch, time, features]
T = torch.rand((8, 80, 16))

#
# NOTE: in this case the real components must be zero as spinor neural networks
# only input and output 3D vectors ! We don't do it here for the sake of compactness
#

# We define our layers and we want 12 quaternion numbers as output (12x4 = 48 output real-values).
cnn_1d = QConv1d( input_shape=T.shape, out_channels=12, kernel_size=3, spinor=True, vector_scale=True)
lin = QLinear(12, input_shape=T.shape, spinor=True, vector_scale=True)

print(cnn_1d(T).shape)
print(lin(T).shape)

torch.Size([8, 80, 48])
torch.Size([8, 80, 48])


Two remarks on Spinor layers:
1. We need to set a vector_scale to train deep models. The vector scale is just an other set torch.Parameters that will scale down the output of each Spinor layers. Indeed, the output of a SNN layer is a set of 3D vectors that are the sum of rotated 3D vectors. Quaternion rotations do not affect the magnitude of the rotated vector. Hence, by summing over and over rotated 3D vectors, we might end up very quickly with very large values (i.e the training will explode).
2. You might consider to use `weight_init='unitary'`. Indeed, quaternion rotations are valid only if the considered quaternion is unitary. Therefore, starting with unitary weights may facilitate the learning phase! 

# Putting everyting together!

We provide a minimal example for both complex and quaternion neural networks:
- `speechbrain/recipes/minimal_examples/neural_networks/ASR_CTC/hyperparams_complex_net.yaml`.
- `speechbrain/recipes/minimal_examples/neural_networks/ASR_CTC/hyperparams_quaternion_net.yaml`.

If we take a look at one of these YAML params file, we can easily distinguish how to build our model out of the different blocks!

In [None]:
yaml_params = """ 
model: !new:speechbrain.nnet.containers.Sequential
    input_shape: [!ref <N_batch>, null, 660]  # input_size
    conv1: !name:speechbrain.nnet.quaternion_networks.q_CNN.QConv1d
        out_channels: 16
        kernel_size: 3
    act1: !ref <activation>
    conv2: !name:speechbrain.nnet.quaternion_networks.q_CNN.QConv1d
        out_channels: 32
        kernel_size: 3
    nrm2: !name:speechbrain.nnet.quaternion_networks.q_CNN.QConv1d
    act2: !ref <activation>
    pooling: !new:speechbrain.nnet.pooling.Pooling1d
        pool_type: "avg"
        kernel_size: 3
    RNN: !name:speechbrain.nnet.quaternion_networks.q_RNN.QLiGRU
        hidden_size: 64
        bidirectional: True
    linear: !name:speechbrain.nnet.linear.Linear
        n_neurons: 43  # 42 phonemes + 1 blank
        bias: False
    softmax: !new:speechbrain.nnet.activations.Softmax
        apply_log: True
        """

Here, we have a very basic quaternion-valued CNN-LiGRU model that can be used to perform end-to-end CTC ASR!

In [None]:
%cd speechbrain/tests/integration/neural_networks/ASR_CTC/
!python example_asr_ctc_experiment.py hyperparams_quaternion_net.yaml

/content/speechbrain/tests/integration/neural_networks/ASR_CTC
100% 8/8 [00:04<00:00,  1.90it/s, train_loss=12.2]
100% 2/2 [00:00<00:00,  5.63it/s]
Epoch 0 complete
Train loss: 12.20
Stage.VALID loss: 4.81
Stage.VALID PER: 90.91
100% 8/8 [00:03<00:00,  2.09it/s, train_loss=7.1]
100% 2/2 [00:00<00:00,  5.54it/s]
Epoch 1 complete
Train loss: 7.10
Stage.VALID loss: 4.40
Stage.VALID PER: 96.36
100% 8/8 [00:03<00:00,  2.12it/s, train_loss=4.73]
100% 2/2 [00:00<00:00,  5.56it/s]
Epoch 2 complete
Train loss: 4.73
Stage.VALID loss: 4.32
Stage.VALID PER: 90.91
100% 8/8 [00:03<00:00,  2.08it/s, train_loss=3.7]
100% 2/2 [00:00<00:00,  5.68it/s]
Epoch 3 complete
Train loss: 3.70
Stage.VALID loss: 4.34
Stage.VALID PER: 89.09
100% 8/8 [00:03<00:00,  2.13it/s, train_loss=3.17]
100% 2/2 [00:00<00:00,  5.66it/s]
Epoch 4 complete
Train loss: 3.17
Stage.VALID loss: 4.74
Stage.VALID PER: 90.91
100% 8/8 [00:03<00:00,  2.11it/s, train_loss=2.85]
100% 2/2 [00:00<00:00,  5.70it/s]
Epoch 5 complete
Train loss:

# **About SpeechBrain**
- Website: https://speechbrain.github.io/
- Code: https://github.com/speechbrain/speechbrain/
- HuggingFace: https://huggingface.co/speechbrain/


# **Citing SpeechBrain**
Please, cite SpeechBrain if you use it for your research or business.

```bibtex
@misc{speechbrain,
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}
```