# Excercise 3: Audio Processing

To complete the exercise, follow the instructions and complete the missing code and write the answers where required. All points, except the ones marked with (N points) are mandatory. The optional tasks require more independent work and some extra effort. Without completing them you can get at most 75 points for the exercise (the total number of points is 100 and results in grade 10). Sometimes there are more optional exercises and you do not have to complete all of them, you can get at most 100 points.

In this exercise, you will generate simple sounds, vary their parameters and perform frequency analysis. You will also familiarize yourself with basic audio filtering and effects.

**Required libraries:** In this exercise you will be reading and writing sound files as well as processing waveforms. Loading and saving will be done using the `soundfile` package, some of the processing and visualization will require the LibROSA library `librosa` that you must install and import to your notebook.


In [5]:
%matplotlib notebook

import scipy
import numpy as np
import matplotlib.pyplot as plt

# Import library for sound visualization
import IPython.display as ipd

# Import librosa to work with sound and soundfile for IO
import librosa as lb
import librosa.display as lbd
import librosa.feature as lbf
import soundfile

In [6]:
# Run this cell to download the data used in this exercise
import zipfile, urllib.request, io
zipfile.ZipFile(io.BytesIO(urllib.request.urlopen("http://data.vicos.si/lukacu/multimedia/exercise3.zip").read())).extractall()

## Assignment 1: Generating sounds

The first assignment will focus on generating simple waveforms, plotting them and playing
them via speakers. It consists of three subtasks in total.

 * Generate a sine wave and plot it. The sine wave is a function of time

    \begin{equation}
    f(t) = A \sin{(\omega t + \phi)}
    \end{equation}

    where $A$ is the amplitude (from 0 to 1), $\omega$ is the angular frequency (i.e. the frequency in Hz multiplied by $2\pi$), and $\phi$ is the phase (in radians). Use the standard sampling frequency of 44.1 kHz. That means that you have to calculate the value of the waveform 44100 times for each second of your recording.

    <i>Note: Plot only the first oscillation of the selected sine wave.</i>

    ![image.png](attachment:image.png)


In [7]:
import math

# TODO: Generate a sine wave y and plot it
time = np.arange(0, 2*math.pi, 0.1)
A = np.sin(time)
plt.plot(time, A)
plt.show()

sr = 44100  # Sampling rate
frequency = 441.0
omega = 2 * np.pi
t = np.arange(1.0 * sr)
y = np.sin(omega * t * frequency / sr)


<IPython.core.display.Javascript object>

 * Using <b>IPython.display.Audio</b>, you can play an audio signal:

In [8]:
ipd.Audio(y, rate=sr) # sr = sampling rate

 * <b>soundfile.write</b> allows you to save the NumPy array of generated audio signal as a WAV file.

In [9]:
# y = audio signal, sr = sample rate
soundfile.write('output_audio.wav', y, sr)

 * The <b>librosa.display.waveplot</b> allows us to plot the amplitude envelope of a waveform. Plot only the first oscillation of the selected sine wave.

    Note, that if $y$ is monophonic, a filled curve is drawn between $\left[-\mathrm{abs}(y), \mathrm{abs}(y)\right]$. However, if $y$ is stereo, then the curve is drawn between $\left[\mathrm{abs}(y[1]), \mathrm{abs}(y[0])\right]$, so that the left and right channels are drawn above and below the axis, respectively.

In [10]:
plt.figure(figsize=(7, 5))
lbd.waveplot(y[:100], sr=sr)

<IPython.core.display.Javascript object>

<matplotlib.collections.PolyCollection at 0x7f4117119580>

 * Sounds encountered in real life situations are never as clean as the sinusoids you generated in the previous assignment. Try adding some noise to the waveform, then plot and listen to the result. Experiment with different types of noise! For better visibility, plot only the first oscillation of the selected waveform.

In [11]:
# TODO: Add noise to the selected waveform, plot it and listen to it
# create noise
noise1 = np.random.normal(0, 1, y.shape)
noise2 = np.random.normal(0, 0.1, y.shape)

# add noise to the sound
y_noise1 = y + noise1
y_noise2 = y + noise2

# plot the new sounds
plt.figure()
plt.subplot(121)
lbd.waveplot(y_noise1[:100], sr=sr)

plt.subplot(122)
lbd.waveplot(y_noise2[:100], sr=sr)

<IPython.core.display.Javascript object>

<matplotlib.collections.PolyCollection at 0x7f411715ca00>

In [12]:
# listen to result 1
ipd.Audio(y_noise1, rate=sr) # sr = sampling rate

In [13]:
# listen to result 2
ipd.Audio(y_noise2, rate=sr) # sr = sampling rate

 *  Harmonics are what gives different instruments their sound color or timbre. They are softer multiples of the primary frequency. Try adding multiples of the primary frequency at a lower amplitude to your sinusoid and listen to it. Experiment with odd and even multiples.

In [14]:
# TODO: Add multiples of the primary frequency at different amplitudes to your sinusoid

# frequencies of the note
f1 = 200*2 
f2 = 200*3
f3 = 200*5
f4 = 200*7

omega1 = 2 * np.pi * f1
omega2 = 2 * np.pi * f2
omega3 = 2 * np.pi * f3
omega4 = 2 * np.pi * f4

theta = 0
A = 0.7

t = np.arange(1.0 * sr)

# sin wave
sw1 = A * np.sin((omega1 * t) + theta)
sw2 = A * np.sin((omega2 * t) + theta)
sw3 = A * np.sin((omega3 * t) + theta)
sw4 = A * np.sin((omega4 * t) + theta)

plt.figure(figsize=(9, 3))
plt.plot(sw1[:int(np.round(sr / f1))])
plt.plot(sw2[:int(np.round(sr / f2))])
plt.plot(sw3[:int(np.round(sr / f3))])
plt.plot(sw4[:int(np.round(sr / f4))])

sw_sum = sw1 + sw2 + sw3 + sw4
f_sum = f1 + f2 + f3 + f4

plt.figure(figsize=(9, 3))
plt.plot(sw_sum[:int(np.round(sr / f_sum) / 4)])

# plot and listen to the results
ipd.Audio(sw_sum, rate=sr)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

 * **(5 points)** Write a function to generate non-sinusoidal waveforms of your choice like square, triangle or sawtooth. You can also experiment with more exotic sounds, like chirp. Implement at least three different sounds.

    Do not use `scipy.signal` functions here, implement the equations yourself. Also pay attention to the frequency of the signal, do not generate sounds below 20Hz as those are not perceived as tones by humans.

In [91]:
# TODO: Implement and plot at least three diferent non-sinusoidal waveforms.
frequency = 5

time1 = np.linspace(0, 1, sr, endpoint=True)
square = np.sign(np.sin(2 * np.pi * time1 * frequency))

time2 = np.linspace(0, 0.2, sr, endpoint=True)
sawtooth = 2 * ((time2 * frequency) % (1/frequency)) - 1

m = 300
chirp = np.cos(2 * np.pi * time2 * (m * time2 + frequency))

plt.figure()
plt.subplot(121)
plt.plot(time, square)
plt.title("Square waveform")

plt.subplot(122)
plt.plot(time, sawtooth)
plt.title("Sawtooth waveform")

plt.figure()
plt.plot(time, chirp)
plt.title("Chirp waveform")

# formule
# square, sawtooth https://thewolfsound.com/sine-saw-square-triangle-pulse-basic-waveforms-in-synthesis/
# chrip https://dspfirst.gatech.edu/chapters/03spect/demos/spectrog/chirps/index.html

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'Chirp waveform')

 * **(5 points)** Now that you know how to generate primitive waveforms you can generate a simple tune. Write a script that generates the following sequence of notes in a list below. Determine suitable note duration, for better effect instert a small pause between notes.

In [97]:
# from midiutil import MIDIFile
# from mingus.core import chords

# NOTES = ['A', 'A', 'E', 'E', 'F#', 'F#', 'E', 'E', 'D', 'D', 'C#', 'C#', 'B', 'B', 'A', 'A']
# array_of_note_numbers = [69, 69, 64, 64, 66, 66, 64, 64, 62, 62, 61, 61, 59, 59, 57, 57]

# track = 0
# channel = 0
# time = 0  # In beats
# duration = 1  # In beats
# tempo = 120  # In BPM
# volume = 100  # 0-127, as per the MIDI standard

# MyMIDI = MIDIFile(1)  # One track, defaults to format 1 (tempo track is created automatically)
# MyMIDI.addTempo(track, time, tempo)

# for i, pitch in enumerate(array_of_note_numbers):
#     MyMIDI.addNote(track, channel, pitch, time + i, duration, volume)

# with open("song.mid", "wb") as output_file:
#     MyMIDI.writeFile(output_file)

# TODO
y_sl, sr_sl = lb.load('song.wav')
ipd.Audio('song.wav')

## Assignment 2:  Frequency analysis

Due to the high sampling rates, visually interpreting the digital audio signal is usually difficult. Transforming the signal to the frequency spectrum allows us to interpret the signal content more directly.

 * Calculate the Fourier transform of a simple waveform using Scipy function <b><a href="https://docs.scipy.org/doc/scipy/reference/tutorial/fft.html">fft</a></b>. You also need to divide the result by the number of points used for the FFT, which is equal to the signal length by default. Since the result is complex and symmetrical, you will only use the positive part to plot the frequency components. Take the absolute value of the result and then use the first $\frac{F_s}{2}$ values (where $F_s$ is the sampling rate) to get useful values. The resulting spectrum should go from $0$ to $\frac{F_s}{2}$, which is the highest theoretical frequency that can be contained in the signal (per the <b>Nyquist theorem</b>). Plot the results for all signals you generated in <i>Assignment 1</i>.

    <b>Question:</b> How do the formula parameters influence the frequency spectrum?

In [87]:
# TODO: Calculate Fourier transform y_fft of the signal from Assignment 1.c and plot the results
from scipy.fftpack import fft, fftfreq

# number of sample points
N = len(y)
halfN = N // 2

x = fftfreq(N, 1/sr) # Return the Discrete Fourier Transform sample frequencies

y_fft = fft(y)
y_fft = y_fft / N

plt.figure()

plt.subplot(131)
plt.plot(x[:halfN], np.abs(y_fft)[:halfN])
plt.title("y")

yn_fft = fft(y_noise1)
yn_fft /= N
plt.subplot(132)
plt.plot(x[:halfN], np.abs(yn_fft)[:halfN])
plt.ylim([0, 0.02])
plt.title("y_noise1")

yn2_fft = fft(y_noise2)
yn2_fft /= N
plt.subplot(133)
plt.plot(x[:halfN], np.abs(yn2_fft)[:halfN])
plt.ylim([0, 0.0025])
plt.title("y_noise2")

plt.figure()
yn2_fft = fft(sw_sum)
yn2_fft /= N
plt.plot(x[:halfN], np.abs(yn2_fft)[:halfN])
plt.title("sw_sum")

# manjkajo se square in ostalo

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'sw_sum')

 * **(5 points)** Aliasing can occur when the signal is sampled too sparsely, which causes high frequencies included in the signal to reflect back to lower spectrum and produce errors in the frequency analysis. Use one of the signals from <i>Assignment 1</i> and sample it with a frequency below the Nyquist frequency (i.e. the sampling rate should be lower than twice the highest frequency present in the signal). Calculate and plot the frequency spectrum.

    <b>Question:</b> Considering the human hearing range, does the standard sampling frequency of 44.1 kHz seem arbitrary?

In [89]:
from scipy import signal

# TODO: Visualize the aliasing problem on one of the signals from Assignment 1
# sample spacing
highest_freq = np.argmax(y_noise1)
nyquist = 2 * highest_freq

# calculation
num = int(nyquist * 0.4) # number of samples
sample = signal.resample(y_noise1, num) # Resample x to num samples using Fourier method
fft_sample = fft(sample)

N = len(sample)
fft_sample = fft_sample / N

# show
print("y_noise1 waveform sampled at 0.4 of its Nyquist frequency")
plt.figure()
plt.plot(np.abs(fft_sample[:num//2]))
plt.ylim([0, 0.02])
plt.show()

y_noise1 waveform sampled at 0.4 of its Nyquist frequency


<IPython.core.display.Javascript object>

## Assignment 3: Filtering

Audio signals can be processed to extract or attenuate certain frequency ranges. Since the design of audio filters is a large field, you will only focus on simple low- and high-pass Gaussian filters and their effects on audio signals.

<i>Note: It might be hard to hear the difference when using laptop speakers, therefore consider listening to the result using headphones.</i>


 *  Use the included function <b>gaussian_kernel</b> to calculate a kernel for performing a <b>low-pass</b> operation on an audio signal. Use the function <b>np.convolve</b> to perform the filtering. Plot and listen to the result. Choose a signal that will produce obvious results.

    ![image.png](attachment:image.png)

In [19]:
def gaussian_kernel(width, sigma):
    # width is the width of the produced kernel
    # sigma defines the shape of the Gaussian function
    
    x = np.linspace(-width / 2, width / 2, width)
    y = np.exp(-x ** 2 / (2 * sigma ** 2))
    y = y / np.sum(y); # normalize
    
    return y

In [20]:
# Load simpleLoop audio file. 
y_sl, sr_sl = lb.load('simpleLoop.wav')

# TODO: Choose an appropriate Gaussian kernel
sigma = 5
gauss_kernel = gaussian_kernel(int(sr_sl * 0.5), sigma)
# 0.001 pad za prekrivanje signala
gauss_kernel_pad = gaussian_kernel(int(0.001 * sr_sl * 0.5), sigma)

# Filter the signal y_sl using the provided gaussian_kernel function and your selected kernel
y_filtered = np.convolve(y_sl, gauss_kernel)
y_filtered_pad = np.convolve(y_sl, gauss_kernel_pad)

# Plot the results
plt.figure()
plt.plot(y_sl)       # Plot the unfiltered signal
plt.plot(y_filtered_pad) # Plot the filtered signal over it

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x7f41030d4b50>]

In [21]:
# Listen to the original audio file
ipd.Audio(y_sl, rate=sr_sl)

In [22]:
# Listen to the filtered audio file
# What do you notice?
ipd.Audio(y_filtered, rate=sr_sl)

 * Convert the low-pass Gaussian kernel into a <b>high-pass</b> filter. This can be achieved by alternately multiplying the kernel coefficients by $-1$. The resulting kernel will remove the low frequency components of the signal an only retain the high frequencies. Test on a sound of your choice. You can also use sounds provided ('simpleLoop.wav', 'piano.wav').

In [23]:
# Load the audio file
y_sl_hp, sr_sl_hp = lb.load('simpleLoop.wav')

# TODO: Convert the low-pass Gaussian kernel into a high-pass filter
sigma = 5
gauss_kernel_2 = gaussian_kernel(int(sr_sl * 0.5), sigma)
gauss_kernel_2_pad = gaussian_kernel(int(0.001 * sr_sl * 0.5), sigma)
gauss_kernel_2[::2] = [x * (-1) for x in gauss_kernel_2[::2]]
gauss_kernel_2_pad[::2] = [x * (-1) for x in gauss_kernel_2_pad[::2]]

# Filter the signal y_sl_hp using the provided gaussian_kernel function and your high-pass filter kernel
y_filtered_hp = np.convolve(y_sl_hp, gauss_kernel_2)
y_filtered_hp_pad = np.convolve(y_sl_hp, gauss_kernel_2_pad * 6)

# Plot the results
plt.figure()
plt.plot(y_sl_hp)       # Plot the unfiltered signal
plt.plot(y_filtered_hp_pad) # Plot the filtered signal over it

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x7f4100dfe2e0>]

In [24]:
# Listen to the filtere audio file
# What do you notice?
ipd.Audio(y_filtered_hp, rate=sr_sl_hp)

## Assignment 4: Effects

Special kinds of filters can also produce other effects. Here you will implement some of them.

 * **Delay**: A delay time-shifts the signal and adds it to itself. Write a function that 
 introduces a delay of a specified duration. You can again use the function **np.convolve** or perform a delay as a weighted sum of original and shifted signal using **scipy.ndimage.interpolation.shift**. Experiment with different delay values, below and above 100ms. Do you notice a difference?

In [25]:
# original piano
sample, sr = lb.load('piano.wav')
ipd.Audio(sample, rate=sr)

In [26]:
sample, sr = lb.load('piano.wav')

# TODO: write a function that introduces a delay of a specified duration
def delay(sample, sr, duration, delay_value):
    sample_delay_length = round(delay_value * sr)
    pad = np.zeros(sample_delay_length)
    
    delayed_signal = np.concatenate((pad, sample))
    delayed_sample = np.concatenate((sample, pad))
    
    sum = delayed_sample + delay_value * delayed_signal
    
    return sum

delayed = delay(sample, sr, 2, 2)

ipd.Audio(delayed, rate=sr)

# glede na delay_value:
# - ce manjsi: slisi se kot en signal (bolj polen zvok)
# - vedno vecji: slisi se kot locena signala

* **Echo**: Echo is a combination of multiple delays combined with attenuation. Write a function that accepts the number of echoes and their corresponding damping factors. Display and play the results.

In [27]:
sample, sr = lb.load('piano.wav')
# TODO: write a function that accepts the number of echoes and their corresponding damping factors

# ker sta d1 in d2 razlicnih shapeov
# operands could not be broadcast together with shapes (154781,) (159191,) 
def multiply(a, b):
    if len(a) < len(b):
        c = b.copy()
        c[:len(a)] += a
    else:
        c = a.copy()
        c[:len(b)] += b
    return c

# multiples
damp_factors = [0.6, 0.5, 0.3, 0.1]
durations = [0.1, 0.3, 0.4, 0.2]

def echo(sample, sr, durations, damp_factors, N):
    d1 = delay(sample, sr, durations[0], damp_factors[0])
    
    for i in range(1, N):
        # another delay
        d2 = delay(sample, sr, durations[i], damp_factors[i])
        eko = multiply(d1, d2)
    return eko
    
eko = echo(sample, sr, durations, damp_factors, 4)
ipd.Audio(eko, rate=sr)

* **(5 points) Flanger**: Is an effect produced by introducing a delay which depends on an outside oscillator. Put simpler, the delay for each sample of the output is not constant but changes based on a sinusoidal function.

In [28]:
# TODO
import math

sample, sr = lb.load('piano.wav')

# inspiration https://publish.illinois.edu/augmentedlistening/tutorials/music-processing/tutorial-2-delay-based-effects/

#  function implements the flanger effect using a triangular function for time-varying delay
def flanger(audio, frequency):
    length = len(audio)
    N = np.array(range(length))
    
    sin_frequency = 1/2
    sin_amp = 0.003
    sin = 2 + np.sin((math.pi * sin_frequency * N) / frequency)
    
    idx = np.around(N - frequency * sin_amp * sin) # read-out index
    idx[idx < 0] = 0 # clip delay
    idx[idx > (length - 1)] = length - 1
    
    out = np.zeros(length)
    
    for i in range(length): # for each sample
        out[i] = float(audio[i]) + float(audio[int(idx[i])])
        
    return out

flang = flanger(sample, sr)
ipd.Audio(flang, rate=sr)

* **(5 points) Distortion**: This effect changes the frequency content of the signal by adding gain to high energy frequencies and thus producing clipping. Implement it by using the formula from the lecture slides

    $$
    y(n) = \frac{(1 + k) x(n)}{1 + k |x(n)|},
    $$

    where k controls the amount of distortion.

    **Question**: How do these effects change the signal in both time and frequency
    domains? If you want to complete this task it is important to know this, not to just implement the effect.

