# Audio scattering classification

In this example class, you implement a classifier of sounds based on the wavelet scattering transform coefficients (see [lecture notes](https://researchcomputing.readthedocs.io/en/latest/material/part12/notebook.html#Wavelet-Scattering-Transform)).



## Question 1

What is a wavelet? What is a filter bank? What is a wavelet scattering transform coefficient?

## Question 2

Install [kymatio](https://github.com/kymatio/kymatio).

Chose the following values for our wavelet parameters:

- `T=8192`
- `J=6`
- `Q=(16,16)`


Using the `scattering_filter_factory` method of `kymatio.scattering1d.filter_bank`, load the filter bank corresponding to these parameters.

The method is defined [here](https://github.com/kymatio/kymatio/blob/11552ed5533d566e6c60d77eecccb37dfb229dec/kymatio/scattering1d/filter_bank.py#L322).

And the docstring reads:

```
    Builds in Fourier the Morlet filters used for the scattering transform.

    Each single filter is provided as a dictionary with the following keys:
    * 'xi': normalized center frequency, where 0.5 corresponds to Nyquist.
    * 'sigma': normalized bandwidth in the Fourier.
    * 'j': log2 of downsampling factor after filtering. j=0 means no downsampling,
        j=1 means downsampling by one half, etc.
    * 'levels': list of NumPy arrays containing the filter at various levels
        of downsampling. levels[0] is at full resolution, levels[1] at half
        resolution, etc.

    Parameters
    ----------
    N : int
        padded length of the input signal. Corresponds to self._N_padded for the
        scattering object.
    J : int
        log-scale of the scattering transform, such that wavelets of both
        filterbanks have a maximal support that is proportional to 2**J.
    Q : tuple
        number of wavelets per octave at the first and second order
        Q = (Q1, Q2). Q1 and Q2 are both int >= 1.
    T : int
        temporal support of low-pass filter, controlling amount of imposed
        time-shift invariance and maximum subsampling
    filterbank : tuple (callable filterbank_fn, dict filterbank_kwargs)
        filterbank_fn should take J and Q as positional arguments and
        **filterbank_kwargs as optional keyword arguments.
        Corresponds to the self.filterbank property of the scattering object.
        As of v0.3, only anden_generator is supported as filterbank_fn.
    _reduction : callable
        either np.sum (default) or np.mean.

    Returns
    -------
    phi_f, psi1_f, psi2_f ... : dictionaries
        phi_f corresponds to the low-pass filter and psi1_f, psi2_f, to the
        wavelet filterbanks at layers 1 and 2 respectively.
        See above for a description of the dictionary structure.
```

## Question 3

Plot the first-order wavelets at original resolution (i.e., $T$ samples) in the frequency domain.

Do an interactive plot for all 63 first-order wavelets in the bank.

Use log-scale for the frequency axis, what do you observe?



## Question 4

Do a similar plot, but now in the time domain, using the inverse FFT.

## Question 5

Show the second order wavelets at original resolution in the time domain.

What do you observe?


## Question 6

Fetch the spoken digit data and plot the goerge time series for digit 0 in an interactive plot.

To fetch the data you can use:

```python
from kymatio.datasets import fetch_fsdd
info_dataset = fetch_fsdd(verbose=True)
```

Then you can access the data via the `info_dataset` object and read the wav files with `scipy.io.wavfile.read`.

## Question 7

What is the time duration of these signals? Give your answer simply as a number of time samples. 

Is it constant for all recordings?

What is the duration of the signal `0_george_0.wav`?

Store the signal in a variable `x`.

## Question 8

We now compute the scattering transform of `0_george_0.wav`.

To do so, use the method `Scattering1D` from `kymatio.torch`. Instantiate it via:

```python
scattering = Scattering1D(J, T, Q)
```

Chose $J=6$ and $Q=16$ (here $Q$ is not a tuple, just a single integer).


What do $J$ and $Q$ control?


Before computing the scattering transform, we need first to convert the array into a torch tensor and then normalize the signal `x` with `max` so it varies between -1 and 1.


Then, compute the scattering transform of `x` via:

```python
Sx = scattering(x)
```

What is the shape of `Sx`? What does each dimension represent?



## Question 9

How many scattering coefficients are there in total? How many coefficients of order 0, 1 and 2?

You can access the metadata of the scattering object via `scattering.meta`, and the `order` information by looking at the `order` key.




## Question 10

Plot the zeroth-order scattering coefficient, both in a 1d plot and also in an `imshow` plot as a vector. 

## Question 11

Plot the first-order scattering coefficients, both in a 1d plot and also in an `imshow` plot as a matrix.

What is the name of the image plot?


## Question 12

Plot the second-order scattering coefficients, both in a 1d plot and also in an `imshow` plot as a matrix.


## Question 13

Using all the  signals in the dataset, test the performance of a logistic regression classifier based on average wavelet scattering coefficients as features.

In the datset, index larger than 5 gets assigned to training set, and the rest to test set (i.e., a 90-10 split).

For the scattering transform, store the data in $T=2^{13}$ samples (i.e., 8192), use $J=8$ and $Q=12$.

The features should be the wavelet scattering coefficients averaged over time, i.e., over all the $2^8$ time points.


For the model, you can use:

```python
model = Sequential(Linear(num_input, num_classes), LogSoftmax(dim=1))
optimizer = Adam(model.parameters())
criterion = NLLLoss()
```
