# Dataset Augmentation

Dataset augmentation increases the usable data for training by resonable data modifications.
With the increased amount of data, larger networks can be trained.

## Dataset Augmentation for images

For images, reasonable modifications can be:

flip left/right (creating a mirror image),

adding noise,

masking small parts of the image with black boxes,

turning left / turning right,

changing the brightness of the image,

transforming a coloured image into a grayscale image,

increasing/decreasing the image size

or others.

The implementation of dataset augmentation for images is shown e.g. on

https://www.tensorflow.org/tutorials/images/data_augmentation

## Dataset Augmentation for audio
For audio/speech data, reasonable modifications can be:

time stretch (e.g. by the phase vocoder),

pitch shift (e.g. by the phase vocoder),

random phase changes (e.g. by the phase vocoder),

masking of 20 ms of audio (comparable to a packet loss in a VoIP-System),

changing the level,

adding noise at a given level

or others.

## Programming exercise:

Write a script which adds Gaussian white noise at a level of 30 dB SNR to a sinus and plot the first three periods of the noisy sinus.

In [1]:
### solution
import numpy as np
f = 440
Fs = 48000
n = np.arange(Fs)
x = np.sin(2*np.pi*f*n/Fs)
Noise = np.random.randn(x.shape[0])
TargetSNR = 30
a = np.sqrt(np.sum(x**2)/(10**(TargetSNR/10)*np.sum(Noise**2)))

y = x + a*Noise

SNR = 10*np.log10(np.sum(x**2)/np.sum((x-y)**2))
assert np.abs(SNR-TargetSNR) < 1e-2, 'wrong SNR'

## Exam preparation

1) At which noise level in SNR, the noise is unhearable?

2) At which noise level in SNR, the voice is just understandable?