In [None]:
%matplotlib inline

In [None]:
from __future__ import print_function, division
import numpy as np
import matplotlib.pyplot as plt

import thinkdsp as dsp
import thinkplot

import IPython
from IPython.display import display
from ipywidgets import interact
import ipywidgets as widgets


# RELATIONSHIP BETWEEN MATHEMATICS AND MUSIC
## Tsvetoslav Nikolov

### Abstract
The music itself is a very very complicated thing. There are a lot of research papers about the relation between music and math. However, in music, we have a logical part which can be described mathematically. But we also have something more complicated. Something, which we can't express as a regular mathematical expression. In this paper, we'll highlight the connection between music and mathematics. We are going to examine the mathematics in common musical concepts, such as waveforms, frequencies, harmony, intervals and tones. We will also understand how the Fourier Transform works, particularly in music (Audio filters, EQs, etc. ). Although you will find a lot of examples which are intended to prove the relation between math and music, the paper focuses on the Fourier Transform, because it's something very cool. You can find it in video and image processing,  etc. In this paper, you can get an intuition to work and experiment with the Fourier Transform.

### Introduction

Science and music appear completely diverse at first glance. Be that as it may, both arithmetic and music encompass structures, designs, relations, and generators of magnificence thoughts and tastefulness. Those who practice the two share numerous common features: abstract considering, imagination, etc. Can we find an “equation” to describe a piece of music? Or better yet, can we find an “equation” to predict the outcome of a piece of music? We can model sound by equations, so can we also model works of music with equations? Music is after all just many individual sounds, right? Should we invest time and money to find these equations so that all of humankind can enjoy predictable, easily described music? The answer to all of these questions is predictable and easily described: a series of emphatic “NO’s”! There is not an equation that will model all works of music and we should not spend time looking for it. But we still have the opportunity to trace a part of music where everything seems to be logical. This paper is separated into two parts. The first one describes how music is related to math in a more theoretical way and the second part is more about audio processing(EQs, computers, audio filters, etc.)


#### About The Author

My name is Tsvetoslav Nikolov. I am 16 years old. I study Math and Computer sciences at school, but I also love music and everything related to it. When I was younger, I started studying music and solfeggio. Although I am so young, I perform my own pieces with Vratsa State Philharmonic. I am a composer, orchestrator and accordion player. As a composer, I have experience in lots of genres, such as Bulgarian Folklore Music (I've written some pieces for folklore dance ensembles), game music, metal and music for symphonic orchestra. I was interested in music as an art, but now I see it in a very different way. However, music is a science, and we can try to express it as a math function, graphs, etc. But it will always be art so we can't describe everything mathematically. This is my first research paper ever so I hope you will enjoy it.

### Musical tone
In this section, we will learn what is a musical tone expressed mathematically.

A musical tone is the result of a regular vibration transmitted through the air as a sound wave. Every tone has a pitch, which is the frequency of the vibration.

Frequency range of a human (all of the frequencies we can hear) is between 20Hz and 20kHz, when you sum the things up, you will find out that we can hear approximately 19 980 frequencies, but in music we have total 12 tones(C, D, E, F, G, A, B or Do–Re–Mi–Fa–Sol–La–Si) to describe all of the frequencies. You can find more info <a href="https://en.wikipedia.org/wiki/Musical_note#Written_notes">here</a>.
 When it comes to math, I like to use  “scientific pitch notation”,  because it's easier. The numerical suffix indicates which octave the note is in. You can find it <a href="https://en.wikipedia.org/wiki/Scientific_pitch_notation">here</a> 

Every note has a different frequency (a different pitch),  but we have more than one A notes because there is a concept in music called octaves. Octaves are just like layers in the musical space. If you take a note A4(440Hz) and multiply its frequency two times, you will get A5(880Hz). If you multiply its frequency 1/2 times you will get A3(220HZ). The same thing works for all of the notes in music. The most important one is maybe A4, which is exactly 440Hz. I think it is important because it is the standard tuning pitch for orchestral music.

Hear two of the octaves of the tone A. And a melody played two times. The second time the melody is in a higher octave so all of the tones are the same, but the frequencies are multiplied by 2.


In [None]:
IPython.display.Audio(filename="Samples/piano_melody_octaves.wav")

Alright, we've already known what is a musical tone. Let's try to express it mathematically. We need some basic terminology(What is signal, frequency)

<b>Signal</b> represents a quantity that varies in time (It's just a function of time). 
<b>The frequency of a signal</b> is the number of cycles per second, which is the inverse of the period.


Now we can represent a note as a periodic signal with given frequency and amplitude. We can use both sin and cos functions. The formula is:

$$ A\sin(2\Pi f t + \phi) $$ 

Where <b>A</b> is the amplitude, <b>f</b> is the frequence, <b>t</b> is the time and $ \phi $ is the phase at t=0.
So for pure A4 (We accept that both amplitude and phase are zero), we will have:

$$ f(t) =  \sin(2\Pi 440 t)$$

In [None]:
def calculate_sin_signal(freq=[440], amp=1, phase=0, min_x = 0, max_x = 0.008, title=""):
        """
        Calculates a numerical approximation to the derivative of the specified function
        at the given point
        """
        plt.figure(figsize=(15, 5))
        for fr in freq:
            x_vals = np.linspace(min_x, max_x, 1_000)
            y_vals = amp * np.sin(2 * np.pi * fr * x_vals + phase)
            plt.title('Sine signal '+title)
            plt.xlabel('Time t')
            plt.ylabel('Amplitude A')
            plt.plot(x_vals, y_vals)
        plt.show()

In [None]:
calculate_sin_signal(title=" A4")

### Intervals, Harmony

#### Intervals
In the time of the ancient Greeks, music was considered as a strictly mathematical discipline, manipulation with number relationships, ratios and symmetry.

The musical interval between two notes can be thought of informally as the “distance” between their two associated pitches. The piano is tuned using equal temperament, which means that the interval between any two adjacent keys (white or black) is the same. This interval is called a semitone. The interval of two semitones is a step or major second, hence a semitone is a half-step, sometimes called a minor second. An octave is 12 semitones. Pythagoras was the first one who creates a system of tuning called "Pythagorean tuning" based on the ratio 3:2 as this ratio is called 'pure' or 'perfect' fifth. You can find more info <a href="https://en.wikipedia.org/wiki/Pythagorean_tuning">here...</a> DDespite the Pythagorean tuning system of ratios, the piano is tunned in a different way.
The following equation gives us the frequency <i>f</i> of the n-th key of the keyboard:
$$ f(n) = (\sqrt[12]{2})^{n - 49} \times 440Hz $$

The ratio between two neighbour keys is $ \sqrt[12]{2}$.
Actually, it's very close to Pythagorean tuning. For example $ (\sqrt[12]{2})^7$, because one pure fifth interval has exactly 7 semitones, and the result of the equation is approximately equal to $\frac{3}{2}$.

Here is a list giving common nomenclature for various intervals:

<i> half-step, or minor second</i>: 1 semitone

<i>step, major second, or whole tone</i>: 2 semitones

<i>minor third</i>: 3 semitones

<i>major third</i>: 4 semitones

<i>fourth, or perfect fourth</i>: 5 semitones

<i>tritone</i>: 6 semitones

<i>fifth, or perfect fifth</i>: 7 semitones

<i>minor sixth, or augmented fifth</i>: 8 semitones

<i>major sixth</i>: 9 semitones

<i>minor seventh, or augmented sixth</i>: 10 semitones

<i>major seventh</i>: 11 semitones

<i>octave</i>: 12 semitones

<i>minor ninth</i>: 13 semitones

<i>ninth</i> 14 semitones

Intervals in one musical composition are everywhere. They can be harmonic or melodic. Harmonic interval means that notes are played at the same time. Melodic is when notes are one after the other. Harmonic intervals are a little bit more complicated than the melodic ones when it comes to the sound wave. The sound wave of one 'harmonic interval' is actually the sum of two or more signals. In the second part of this paper, you will find how every complicated signal can be decomposed in at least one $\sin$ or $\cos$ function.

Let's try to represent a pure fifth interval. We need a base note. Let's take A4, which is exactly 440Hz, and to find the frequency of the second note, we need to compute this: 
$$ \frac{3}{2} \times 440 = 660 Hz $$
This is the representation of a pure fifth

In [None]:
#Campute the signal for A4
calculate_sin_signal(freq = [440], amp=1, phase=0, title="A4")

#Calculate the signal for E5
calculate_sin_signal(freq = [3/2 * 440], amp=1, phase=0, title="E5")

#Calculate both signals with more time on the x axis
calculate_sin_signal(freq = [3/2 * 440, 440], max_x = 0.01, title="Both A4 and E5 (pure fifth from A4)")

The shape of a periodic signal is called the <b>waveform</b>. Most musical instruments produce waveforms more complex than a sinusoid. The shape of the waveform determines the musical <b>timbre</b>, which is our perception of the quality of the sound. People usually perceive complex waveforms as rich, warm and more interesting than sinusoids.

For our next examples, we will use a python package called thinkdsp, because that's how python works(everything you need to know is the word 'import' :D). The library will make our code prettier.

Let's recreate our signals but this time with the dsp package.

In [None]:
A4 = dsp.SinSignal(freq=440, amp=1, offset=0)
E5 = dsp.SinSignal(freq=3/2 * 440, amp=1, offset=0)

#The graphs of both signals
plt.xlabel("Time")
plt.ylabel("Amplitude")
A4.plot()
E5.plot()
plt.title("A4 and E5")
plt.show()

#The sum of the signals
PureFifth = A4 + E5
PureFifth.plot()
plt.xlabel("Time")
plt.ylabel("Amplitude")
plt.title("The sum of both signals")
plt.show()

One of the goodies you get by importing thinkdsp is that you write smaller code. We can hear the interval. First, we need to create a wave (a sequence of points in time and this process of converting signals into waves is called sampling, you can find more about what is a wave in the second part of the paper, in the Fourier transform section).

In [None]:
#Convert the sum of the signals into wave

PureFifthWave = PureFifth.make_wave(duration=1.75)

#Then listen to the result

PureFifthWave.make_audio()

#### Harmony

Harmony is actually the fundamental piece in every musical composition. When we say 'Something is harmonious', we mean that it is beautiful, it has some kind of symmetry encoded in it. In music, we can find two terms which describe if something is harmonious or not. 

<b>Consonance</b> - when something sounds good to our ears. <b>Dissonance</b> - when something sounds just terrible.

One interval can be consonant or not. I will not describe things in such details, because this paper is intended to be read of people who aren't musical professionals.

The consonance and dissonance actually have a physiological basis. Something can be very awful when it's played alone, but when you add it to other parts of the composition, it becomes a pleasant sound.

For example, let's take a perfect fifth (3 : 2) and a tritone (the ratio of the tritone is really REALLY beautiful... It's $ \sqrt2$ :D)

To express the interval we need a fundamental note. A4 (440Hz) is just perfect because we've already created in the previous example.

Let's draw it's sine signal function again.

In [None]:
def plot_tritone(freq, note_name): 
   
    #Define the tritone from A4 with the same amplitude
    tritone = dsp.sin_wave(freq=np.sqrt(2) * freq, duration=1)
    
    plt.figure(figsize=(12, 5))
    plt.title('a tritone from '+note_name)
    plt.xlabel("Time")
    plt.ylabel("Frequency")
    
    #Plot the A4 signal graph(We've just created it in one of the previous examples)
    A4.plot()
    tritone.segment(start=0,duration=A4.period * 3).plot()
    plt.show()

plot_tritone(freq=440, note_name="A4")


Let's hear the tritone from A4

In [None]:
IPython.display.Audio(filename="Samples/piano_tritone.wav")

Now let's hear the pure fifth which we created early.

In [None]:
IPython.display.Audio(filename="Samples/piano_fifth.wav")

Which one sounds more natural and harmonious? 
There are different solutions to this problem.

###### Pythagorean solution
Pythagorean tuning says that interval is harmonious when the ratio between two notes is a simple rational number. The ratio has to be simple enough. For example, ratio $ \frac{3}{2}$ sounds more natural than $\frac{333}{332}$. We've just seen that the ratio $ \frac{3}{2}$(pure fifth) sounds very nice compared to $\sqrt{2}$ (tritone).

###### Another solution
Another solution is to say that there is some kind of pattern in the wave. When you sum up two signals, the graphs will intersect each other in sync. We have the graph of the pure fifth from A4. Let's look at it again.

In [None]:
calculate_sin_signal(freq = [3/2 * 440, 440], max_x = 0.030, title="Both A4 and E5 (pure fifth from A4)")

As the graph shows, there is a nice pattern in the signals. Their first intersection is where x is approximately 0.004 end then they repeat the same pattern again until the next intersection. We can't see the same pattern in the tritone graph, because there is no sync at all.

Hear two intervals. Can you guess which one is the tritone?

In [None]:
IPython.display.Audio(filename="Samples/piano_tritone_ending.wav")

You can find a nice description of the second solution <a href="https://youtu.be/cyW5z-M2yzw?t=106">Here</a>

But when you listen to an interval without anything playing in the background. You can say that it's consonance or dissonance, but actually, when you put it in a different chord, in a different situation at all, you can hear to a beautiful and harmonious sound. For example, when you take a tritone, it sounds terrible, but when you add it to a chord ( If you are in C major scale it's good to choose the dominant(G major seventh chord) because it has a triton in it). If you don't understand the whole thing in the brackets, don't worry, it's OK. I just want to show you how one interval can sound terrible, but in a different situation, it will sound OK. Let's hear the triton inside the dominant seventh chord. That is why we can't say that something is 100% dissonant.

This is the dominant chord alone.

In [None]:
IPython.display.Audio(filename="Samples/piano_dominant.wav")

The same chord (The third chord in the progression), but this time we have some movement in the harmony.

In [None]:
IPython.display.Audio(filename="Samples/piano_chords.wav")

Before you read the last part of the music harmony section, you have to read the second part of the paper(The Fourier transform), because we will use it.

What one orchestrator (it doesn't matter for what kind of orchestra) does, is actually just pure mathematics, in particular - spectral analysis. The main function of one arranger(orchestrator) is to orchestrate a piece of music for given instruments. One orchestration must sound very dense, solid. To do this, the orchestrator must know the sound of every instrument(by saying sound here, I mean the waveform, the timbre) and also the range in which the instrument can play(the lowest and the highest note).

Alright, that was the musical part, let's start with the mathematics. There is something called "spectrum". Basically you take the signal( $f(t)$ where $t$ is time) and convert it to ($\hat f(k)$ where $k$ is frequency). Just change the time domain with the frequency domain of the signal. We can do this, because of the Fourier Transform. We have x-axis with all of the frequencies and the y-axis with the amplitudes. So to have a nice orchestration at all(it depends on what kind of orchestration and band) we need to have a nice and smooth low-end which doesn't disturb the whole spectrum. One fundamental thing to understand is that whenever we play a bass note on a bass instrument we don't play just the pure tone. We play the tone with all of the harmonics which come with it. So when we play a bass note A, we get lots of harmonics around the actual frequency of the note. That's why every bass part is simple(there are some exceptions).<img src="Images/BassPlayerMeme.jpg" width=500/>

It's funny, but it's true. Basses are the fundament of everything. When you play a note A, you actually play the harmonics of A major scale(I - A, III - C#, V - E). If you play A and C in the bass line it will be disgusting. From the tone A we get the harmonics A, C#, and E, but from C we get the harmonics of C, E, G. The interval between C and C# is a semitone which sounds just fantastic(If you hear it without anything else, you will kill yourself immediately :D). Then the C# and C harmonics are in every octave above them so we have this suspense in all of the octaves, because if the harmonics. If you don't understand all of these musical terms, don't panic, it means that something can theoretically sound good (It sounds harmonious because if the ratios and blah blah blah :D), but because of the spectrum and the harmonics which we will see in the next example, the sound can become terrible.

Let's analyze the spectrum of a bass note and see all of the harmonics in it.

In [None]:
def plot_spectrum(file=None, wave_data=None, title="", x_lim=4_500):
    """
    Plots the spectrum from a given wave file or ThinkDSP wave object and returns the spectrum as an thinkDSP spectrum object
    """
    
    if wave_data == None and file != None:
    #Read the file
        wave = dsp.read_wave(filename=file)
    else:
        wave = wave_data
    
    #Compute the spectrum with the Fourier transform. More ditails in the Fourier Transform section.
    wave_spectrum = wave.make_spectrum()

    #Make the graph bigger
    plt.figure(figsize=(15, 4))
    plt.title(title)
    #Just to zoom in
    plt.xlabel('Frequency')
    plt.ylabel('Amplitude')
    plt.xlim((0, x_lim))
    wave_spectrum.plot()
    plt.show()
    
    return wave_spectrum
    
def plot_waveform(file=None, wave_data=None, title="", x_lim=5):
    """
    Plots the waveform from a given wave file or ThinkDSP wave object
    """
    
    if wave_data == None and file != None:
    #Read the file
        wave = dsp.read_wave(filename=file)
    else:
        wave = wave_data
    
    #Compute the spectrum with the Fourier transform. More ditails in the Fourier Transform section.

    #Make the graph bigger
    plt.figure(figsize=(15, 4))
    plt.title(title)
    #Just to zoom in
    plt.xlabel('Time')
    plt.ylabel('Amplitude')
    plt.xlim((0, x_lim))
    wave.plot()
    plt.show()
    
plot_spectrum(file="Samples/piano_A0.wav", title="Note A0")
IPython.display.Audio(filename="Samples/piano_A0.wav")


We definitely see a lot of stuff in the spectrum. We can see a high (It's the highest one) pick on the frequency of the note A0. Then lots of harmonics. So keep in mind that when you think of a note played on a real instrument as a pure tone, you're wrong... With different instruments, we can get different harmonics, because of the timbre. The rule is: "Lower tone - more harmonics". The bass part must be the simplest part of the whole orchestration, because of the graph we see above. Alright, we've just started orchestrating mathematically by just looking in the spectrum.

Let's hear the same tone A. but this time in a different octave.

In [None]:
plot_spectrum(file="Samples/piano_A5.wav", title="Note A5")
IPython.display.Audio(filename="Samples/piano_A5.wav")

As you can see in the previous graph of the A0 tone, the lower tone on the piano(the one with the lower pitch, frequency) has more harmonics. This phenomenon is fundamental for all mixing engineers and we've just expressed it mathematically. If you have lots of notes in the lowest frequencies, your sound will be muddy, and because of this concept, we have EQs, but more about them in the section named 'EQs'.

Let's see the difference if we play chords in the lower end and then in the middle or high frequencies.

In [None]:
plot_spectrum(file="Samples/piano_A0Chord.wav", title="A minor chord in the low end", x_lim=3000)
IPython.display.Audio(filename="Samples/piano_A0Chord.wav")

In [None]:
plot_spectrum(file="Samples/piano_A4Chord.wav", title="A minor chord in the middle frequencies", x_lim=3000)
IPython.display.Audio(filename="Samples/piano_A4Chord.wav")

As the graph of the low end A minor chord shows, we can see a cluster in the low frequencies. This makes our sound horrible. The orchestrator knows how different instruments sound like, but he actually knows the timbres and respectively their harmonics frequencies. We've tried to show this mathematically in a very basic way. If you look at the spectrum of a music piece in a given moment in time, you should see all of the frequencies ranges(lows, mids, highs) evenly weighted or with a nice curve, and then the orchestration is really solid. That's why symphony orchestra is cool, we have a lot of different instruments, timbres, harmonics. For example, if you play A4 on piano and then on a trumpet, the frequency still the same, but the difference is in the timbre with all of the harmonics.

Can you imagine how powerful is the Fourier Transform? Look at this <a href="https://www.landr.com/online-audio-mastering/">website</a>. This is a real example of what we can do with the spectrum of a song (analyse it, create a unique fingerprint, etc.).

### The Fourier Transform
The main idea is that every continuous signal in the time domain can be represented uniquely by an infinite series of sinusoids. If we have a signal generated by the function x(t) where t is time. We can compute another function f(t) which is also a function of time.
$$ f(t) = \frac{a_0}{2} + \sum_{k=1}^{\infty} (a_k \cos{2\pi k t} + b_k \sin{2 \pi k t}) $$
Where $k$ is the frequency.

It doesn't matter how complex the signal is, we always can describe it as an infinite number of sinusoids. If you look at the formula above, you see that the sinusoids depend on the coefficients a_k and b_k. The way we can find them is called the Fourier Transform. Let's look at the formula.

$$ X(f) = \int_{-\infty}^{+\infty}x(t)e^{-2 \pi i f t} dt $$

The formula doesn't look very clear for us, but it's not so difficult to understand it. When you solve the integral, you get one complex number <i>a + bj</i>. The real part $a$ and the imaginary part $b$ are the coefficients $a_k$ and $b_k$. If you want to understand in deep how the Fourier Transform works, you definitely want to see <a href="https://www.youtube.com/watch?v=spUNpyF58BY">this video</a>

But this integral is just awful. I mean, look at it, from negative infinity to infinity. Luckily, you never have to do this in practice. Instead of collecting data on a continuous infinite signal, we just collect a set of discrete points from time equal zero to the Nth sample. The actual formula we will use is this one:
$$ x_k = \sum_{n=0}^{N-1}x_ne^{-\frac{i2 /pi kn}{N}}$$
Where $x_k$ is kth frequency bin in the spectrum.
It's slightly different from the continuous Fourier Transform, which we described earlier. Rather than running an integral from negative infinity to infinity, we just evaluate the summation from n equal 0 sample to the N-1 sample. There is a difference in the exponent also because we can't look at a frequency in time $ft$ continuously. Instead, we look at the kth frequency bin in the $n$ sample.
$$\frac{k}{N} = f$$
$$ n=t$$

The trouble is, Fourier transform is only a theoretical approach and can never be performed in practice. The main reason is that it transforms analytical signals, not discrete sampled signals limited in time. Even if you could turn the entire earth into a giant supercomputer, you still wouldn't have enough power to process even a second of audio data using real perfect Fourier transform.

Let's try to expand the Discrete Fourier transform formula. We will do it with a closer look at the exponent.
$$ e^{-\frac{2\pi ikn}{N}}$$
We want to convert it to $e^{ib}$ so
$$ b_n = \frac{2\pi kn}{N} $$
And then if we expand the summation, we get:
$$ x_k = \sum_{n=0}^{N-1}x_ne^{-b_ni}$$
Now we can use Euler's Formula $ e^{ix} = \cos{x}+i\sin{x} $ to compute this exponential.
$$ x_k = \sum_{n=0}^{N-1}x_n[\cos(-b_n) + i\sin(-b_n)]$$
When we sum the things up. We will end with the following result. One complex number.
$$ x_k = A_k + B_ki $$
But how can we actually use this result?

Let's say our result is $1 + 2j$. 

We need to plot this complex number to a complex plane. By using the real and the imaginary part of the number as coordinates. Ones we plot the result, we can extract information. Let's see it in practice.

In [None]:
def plot_complex_number(z):
    """
    Plots the complex number z as a radius vector in the 2D space
    """
    plt.figure(figsize=(15, 8))
    x = z.real
    y = z.imag
    plt.quiver(0, 0, x, y, angles = "xy", scale_units = "xy", scale = 1)
    plt.xticks(range(-3, 9))
    plt.yticks(range(-3, 9))
    ax = plt.gca()
    
    ax.set_aspect('equal')
    ax.spines['bottom'].set_position('zero')
    ax.spines['left'].set_position('zero')
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    
    plt.show()
    pass

plot_complex_number(1 + 2j)

The basic idea here is that the magnitude of the complex number (or if you think of it as a vector, then the magnitude of the vector) is the amplitude of the k-th frequency bin on the spectrum. The angle is the phase or how much the signal is shifted to the left(when this signal starts in the time domain). The formulas are these:
We've already compute the result $ x_k = A_k + B_ki $. Then we use the pythagorean theorem to compute the magnitude.
$$ Magnitude_k =  \sqrt{A_k^2 + B_k^2} $$
We can compute the phase angle $ \phi $ also with 

$ \phi =  \arctan\frac{B_k}{A_k} $ or $ \phi = \tan^{-1}\frac{B_k}{A_k} $

This is the code for our example where the result is $1 + 2i$, but you can try different combinations.

In [None]:
def amplitude_phase(number):
    """
    Compute the amplitude and phase angle.
    Parametars:
        number: a complex number
    Output:
    (amplitude, phase)
    """
    real = number.real
    imag = number.imag
    magnitude = np.sqrt(real**2+imag**2)
    phase = np.arctan([imag, real])
    
    return (magnitude, phase[0])

def compute_result(r, i):
    result = complex(r, i)
    amplitude, phase = amplitude_phase(result)    
    print("Amplitude:", amplitude)
    print("Phase:", phase)
    
    plot_complex_number(result)
    calculate_sin_signal(amp=amplitude, phase=phase)
#     dsp.SinSignal(freq=440, amp=amp, offset=phase).plot()
    
interact(compute_result, r=1, i=2)

So now we know that we have an amplitude of k-th bin (or k-th frequency in the spectrum) equal to 2.23606797749979.

To understand the algorithm deeper, let's take an example.

Let's say we have a signal (sinusoid with a frequency equal to 1Hz and amplitude of 1)

In [None]:
#parameters for our signal
freq = 1
amp=1

#The time domain
t_vals = np.linspace(0, 1, 1_000)
#amplitude vals, we ignore the phase for now
amp_vals = amp * np.sin(2 * np.pi * freq * t_vals)
plt.figure(figsize=(12, 5))
plt.title('Sine signal 1Hz')
plt.xlabel('Time t')
plt.ylabel('Amplitude A')
plt.plot(t_vals, amp_vals)

#Our sampling rate
sampling_rate = 8

#we take a descrete points of the signal, exactly 8 points 
t_samples_vals = t_vals[::round(len(amp_vals) / sampling_rate)]
#calculate the amplitude vals of these points
amp_samples_vals = amp_vals[::round(len(amp_vals) / sampling_rate)]
#plot them
plt.scatter(t_samples_vals, amp_samples_vals)
plt.show()

#Compute the FFT using np.fft module and then take the absolute values
signal_fft = np.fft.fft(amp_samples_vals)
#Now we have the coefficients, we need to calculate the amplitude
fft = np.asarray([amplitude_phase(f) for f in signal_fft])[:, 0]
#And plot everything
plt.figure(figsize=(12, 5))
plt.title("The Fourier transform of the signal")
plt.plot(fft, 'o')
plt.xlabel("Frequency")
plt.ylabel("Amplitude")
plt.show()

Now we can see the spectrum of a 1Hz signal. But why we see two frequencies with the same amplitude? Our signal doesn't contain an 8Hz signal in it. What’s going on?

Because of the fact that we've sampled the signal at 8 samples per second, the highest frequency which we can detect is 8/2 = 4 Hz(half of our sampling rate). The problem is that when you evaluate the signal at discrete points in time, you lose information. The term for this is <a href='https://en.wikipedia.org/wiki/Nyquist_frequency'><b>nyquist frequency</b></a>.

### The Fourier Transform In Music

In the first part of the paper, you can see how math can describe the musical theory for orchestration, etc. In this part, we will focus more on audio engineering. I have experience with programs for audio editing, but I've never asked myself about the math behind the scenes. In my point of view, you don't need to learn math to be a composer, orchestrator, etc. But if you want to be someone who wants to invent new things and experiment with them, particularly in music, this is the right place for you. The Fourier Transform gives us a lot of stuff to deal with. As you can see in one of the previous parts (Orchestration and harmony), we can use the spectrum of a song or orchestration to analyze it. We can even recognize and classify the instruments in one song. That means that we can use ML to analyze the timbres. Imagine how useful will be an application which can tell you details about your orchestration, your arrangement. Where in the song you have to add more bass, or where you have to remove it. Because we can use EQ to increases the volume of some frequencies, but it will not sound as good as if you make changes in the notes which the instruments play. In the previous section, you can find information about the Fourier Transform and how it works. Here you can find examples of its usage in music production.

#### EQs

One of the most powerful techniques for manipulation of audio is equalization (EQ). Equalization allows all kinds of magic such as the ability to pull out a voice from the background, accentuate the bass, suppress a particularly noisy instrument or clip tinny sounding higher frequencies. There a few different kinds of EQs. Perhaps the most famous types are graphic and parametric. Here is how they look like.

### Prametric EQ
<img src="Images/parametric_eq.png">

### Graphic EQ
<img src="Images/graphic_eq.png">

If you look closer, you can find that it's just like a function. We have x-axis for the frequencies and y-axis for the amplitudes. 

Using an parametric EQ you can deal with:
1. Frequencies (the center of the frequency range to be cut or boosted)
2. Gain (the amount of boost or cut)
3. Q (the "sharpness" of the boost or cut, with higher Q meaning a narrower filter)

But with a graphic EQ, each filter is the same shape and has just one control -  the amount of boost or cut.

Whenever we have a transformation from the time domain to the frequency domain, the Fourier Transform is definitely used. So here we also have it. EQs are able to convert our song so we can see its frequencies and interact with them. This way we can apply changes to the whole sound of a song. If there is a lot of bass frequencies, pull down them, and boom - you have a balanced low end. If we have lots of highs, pull them down. Actually, EQs are always used in the mixing process of a song.


##### Audio filters

There are different types of audio filters. As we said before, when one song has too much-concentrated energy in the low end, we need to reduce its low frequencies by using an EQ. This can be done with a high-pass filter. This filter surprisingly passes the high frequencies :D. The audio filters determine the whole shape of the spectrum. And once we've changed the spectrum, we can compute the wave with the Inverse Discrete Fourier Transform which is very similar to the Fourier Transform formula. 

<img src="Images/filters_types.webp"></img>

More info about the types <a href="https://en.wikipedia.org/wiki/Audio_filter"> here</a>

In the previous part, we saw the spectrum of a bass note with all of the harmonics.  We see that when we play a bass note, we get too much pressure along the whole spectrum not just in the low frequencies. We can reduce it as we remove its harmonics with a low pass filter.

Let's see how audio filters can help us in practice. We're going to use <i>thinkdsp</i>.

In [None]:
#read the wav file
A0 = dsp.read_wave(filename="Samples/piano_A0.wav")
#compute the spectrum
A0_sp = plot_spectrum(wave_data = A0, x_lim=3e3)

IPython.display.Audio(filename="Samples/piano_A0.wav")

Now, let's try to remove all of the frequencies above...Let's say 500Hz

In [None]:
A0_new_spectrum = A0_sp.copy()
A0_new_spectrum.low_pass(cutoff=500)
plot_spectrum(wave_data = A0_new_spectrum.make_wave(), x_lim=3e3)
A0_new_spectrum.make_wave().make_audio()

As we can see, the low pass filter has removed all of the frequencies above the cut-off frequency. But which recording sounds more like a real piano. As we remove the harmonics, we make the sound more unnatural. The note is still the same, but the timbre isn't what we expected. That's why it's better to change the orchestration when we want to remove some of the harmonics. For example, it will be better if we choose a different instrument to play this note, an instrument with different harmonics. But it will be super cool if we have a computer which can perform this mathematical task for us, a computer which can think about the timbres and use them in a perfect way. This is the moment when math crosses the border.

Let's take another example. When we record a human voice, we usually have to fix lots of problems. One of them is that there are sibilant consonants( "s", "z", "ch", "j" and "sh") which don't sound OK to our ears. The process of fixing is called De-essing. The easiest way is to use EQ and some audio filters.

In [None]:
sibilance = dsp.read_wave(filename="Samples/sibilance_speech.wav")
plot_waveform(wave_data = sibilance, title="Waveform")
sibilance_speech = plot_spectrum(wave_data=sibilance, title="Spectrum")
sibilance.make_audio()

In [None]:
#Load the sample of just one pure sibilance sound to analize the spectrum

sibilance_sample = dsp.read_wave(filename="Samples/sibilance_sound.wav")
plot_waveform(wave_data = sibilance_sample, title="Waveform")
sibilance_sample_spectrum = plot_spectrum(wave_data=sibilance_sample, title="Spectrum", x_lim=19_000)
sibilance_sample.make_audio()

As the spectrum shows, there is a lot of energy between 12kHz and 15kHz. So the sibilance sound must be there. Let's try to reduce this energy and here the sound. The easiest way to do that is to apply a low pass filter. This filter removes all of the frequencies after the cut-off frequency.

In [None]:
n = sibilance_speech.copy()
n.low_pass(cutoff = 6.25e3)
plt.figure(figsize=(15, 5))
plt.xlim(0, 16e3)
plt.xlabel("Frequency")
plt.ylabel("Amplitude")
n.plot(color="black", label="Low pass filter")
sibilance_speech.plot(color="grey", label="Actual sound")
plt.legend()
deesed_speech = n.make_wave()
deesed_speech.make_audio()

And if we plot both graphs on the time domain, we can see the differences in the waveforms.

In [None]:
plt.figure(figsize=(17, 10))
deesed_speech.plot()
sibilance.plot(color="grey")
plt.xlabel("Time")
plt.ylabel("Amplitude")
plt.show()

Compared to the previous recording it's better, but this was the easiest way. If we want to actually remove the sibilance sound without hurting the voice, we need to play with the frequencies. Maybe we can add more filters, etc. If we find the frequency or frequencies on which something is playing, we can remove it as we reduce these frequencies. For example, we have a whole track with a lot of instruments in it, and we want to reduce the snare drum. We can find the frequencies and take them down. There is a nice <a href="https://producelikeapro.com/blog/eq-cheat-sheet/">cheat sheet</a> on this topic. Another case in which we want to reduce different frequencies is when we mix some stuff. For instance, The bass guitar and the kick drum in one song have the same frequencies (50 - 80Hz), we don't need to change the whole track frequencies, only thing we need is to open an EQ on the guitar track and reduce frequencies between 50 - 80Hz so the guitar doesn't interrupt the kick. We can do this, because of the audio filters, EQs and the Fourier transform. We can also remove noise from a track by analyzing the frequencies just like we did in the previous example.

If you want to learn more about audio processing, there is an awesome library called <a href="https://librosa.github.io/librosa/">"Librosa"</a>. There is a <a href="https://musicinformationretrieval.com/">nice notebook </a> which will introduce you to the world of audio processing and its application in machine learning.

###  *Bonus Topic
#### Spectrogram

Because thinkdsp has a problem with the newer version of numpy, we need to import another cool library named librosa.

#### What is a spectrogram? 

Well, we've already seen how to plot from the time domain to the frequency domain, but in every graph, until now you can see amplitude values on the y-axis. At this moment the spectrogram comes. Basically, it's the same as the spectrum, but on the x-axis we have time and on the y-axis - frequencies. That's it!

#### How does it work?

The same as the Discrete Fourier Transform, but we have a window with a given width, which goes all along the x-axis and compute the Discrete Fourier Transform in every step. This is called " Short-Time Fourier Transform  or <a href="https://en.wikipedia.org/wiki/Short-time_Fourier_transform">STFT</a>".

$$ X(m, w) = \sum_{n}x(n)w(n-m)e^{-jwn} $$

Where $m$ is time, and $w$ is frequency. When we increase $m$ value, we move the window to the right, and then compute the result for the sample $ x(n)w(n-m) $

In [None]:
import librosa, librosa.display

#Load the file
data, fr = librosa.load(path="Samples/piano_melody.wav")

#segments length
hop_length = 512 
n_fft = 2048
#SFTF
X = librosa.stft(data, n_fft=n_fft, hop_length=hop_length)

S = librosa.amplitude_to_db(abs(X))

plt.figure(figsize=(15, 10))
librosa.display.specshow(S, sr=fr, hop_length=hop_length, x_axis='time', y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.show()
IPython.display.Audio(data=data, rate=fr)

We can see the musical tones and their harmonics. This way we can detect how one frequency varies in time. If we increase the length of the segments we get worse time resolution. If we decrease them, we get a worse frequency resolution. Librosa has a nice module called <a href="https://en.wikipedia.org/wiki/Mel-frequency_cepstrum">Mel-spectrogram</a> where we can plot the amplitude in the log scale so we can see everything very clear. <a href="https://www.youtube.com/watch?v=jhziJ1yd9j4">This video </a> shows you the differences between linear and logarithmic scale. 

In [None]:
S = librosa.feature.melspectrogram(data, sr=fr, n_fft=4096, hop_length=hop_length)
logS = librosa.power_to_db(abs(S))
plt.figure(figsize=(15, 10))
librosa.display.specshow(logS, sr=fr, hop_length=hop_length, x_axis='time', y_axis='mel')
plt.title("Mel-spectrogram")
plt.show()
IPython.display.Audio(data=data, rate=fr)

It's definitely better. The graph shows all 5 tones of the recording. Awesome, we know the movement of the melody.

This is a perfect way to decode it as "U" for "UP", "D"  for "DOWN" and "S" for "STAYS THE SAME". Then we can try to compare it to different encoded melodies and see the result, but I'll keep this as an interesting exercise for you :D

### Conclusion

The idea of this paper is to represent the relation between mathematics and music in two different aspects - in audio processing and in the pure music theory. We've proved this with many examples, such as EQs, audio filters, the Fourier Transform. Music is a really powerful art, but when we combine it with math theory we can do a lot of things. We can classify different timbres by extracting different features, we can analyze, we can make our songs sound nice and balanced using EQ, etc. The most powerful part of everything comes when we combine music theory with audio processing and math. Computers will be able to compose and orchestrate new songs and analyze their spectrums to make them sound nicer. For me, it will be an adventure to work in this direction, because it will bring the music we know to the next level.

#### References

<a href="https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw">3Blue1Brown youtube channel</a>, which is just awesome :D

<a href="http://www.songstuff.com/recording/article/equalization_eq/">Information about EQs</a>

<a href="http://greenteapress.com/thinkdsp/html/thinkdsp002.html">Thinkdsp book</a>

<a href="http://www.musimathics.com/">A book which I found during the research. It seems interesting</a>

<a href="https://www.allaboutcircuits.com/technical-articles/an-introduction-to-filters/">https://www.allaboutcircuits.com/technical-articles/an-introduction-to-filters/</a>

<a href="https://librosa.github.io/librosa/">"Librosa"</a>

<a href="https://musicinformationretrieval.com/">Nice notebook </a>

<a href="https://producelikeapro.com/blog/eq-cheat-sheet/">Cheat sheet</a>

<a href='https://en.wikipedia.org/wiki/Nyquist_frequency'>Nyquist frequency</a>

<a href="https://www.landr.com/online-audio-mastering/">Online Audio Mastering</a>

<a href="https://en.wikipedia.org/wiki/Short-time_Fourier_transform">STFT Wiki</a>

<a href="https://www.youtube.com/watch?v=jhziJ1yd9j4">Logarithmic scale video</a>