## Exploring Sound Generation with Python wave module

Have you ever wondered how a computer is able to play a sound file or how we are able to record and store sound?
How does a sequence of zeroes and ones get converted into physical sound that come out through the speakers?
How are all these real-life analogue sound saved as zeroes and ones?

In this notebook, we will explore the physics of sound and how the information is stored digitally inside a raw WAV file.

1. Writing a WAV file
2. Understanding the data format
3. Physics of sound
4. Representing a note
5. Representing a melody
6. Writing a simple song and saving as a WAV

#### Prerequisites:

* Basic understanding of Python
* Basic understanding of a sine function
* Basic understanding of Newtonian mechanics
* Imagination

### 1. Writing a WAV file

In this notebook, we will be using a standard Python module called *wave* that provides basic file writing and reading functionalities. First we will generate a WAV file with 10 seconds of white noise, just to see how the wave module works.

The code may look a little confusing at first, but we will go through the code step-by-step in Part 2 of this notebook.

You can also read [this great article](http://soledadpenades.com/2009/10/29/fastest-way-to-generate-wav-files-in-python-using-the-wave-module/) on generating wave files written by [Soledad Penades](https://github.com/sole)

Let's start by importing the *wave* module.

In [1]:
import wave

Next, we will also import:
* *random* module for generating random bits for the random noise
* *struct* module for manipulating bytes

In [2]:
import random, struct

Now we can use the wave module to create an instance of the file that we will be writing.

In [3]:
wav_file = wave.open('output/myNoise.wav', 'w')

According to the [documentation](https://hg.python.org/cpython/file/3.5/Lib/wave.py) for the wave module, "62: You should set the parameters before the first writeframesraw or writeframes."

To set the parameters, we will call the [setparams((nchannels, sampwidth, framerate, nframes, comptype, compname))](https://docs.python.org/3/library/wave.html#wave.Wave_write.setparams) method on the *wav_file* instance.

The parameters will be:
* nchannels (number of audio channels) : 2
* sampwidth (number of bytes per audio sample) : 2
* framerate (number of samples per second) : 44100
* nframes (the number of audio frames written to the header) : 0
* comptype (compression type) : 'NONE'
* compname (human-readable compression type) : 'not compressed'

We will be reusing the value for framerate, so we can assign it to a variable *FRAMERATE*.

In [4]:
FRAMERATE = 44100
wav_file.setparams((2, 2, FRAMERATE, 0, 'NONE', 'not compressed'))

Next we will create the bytes representing the white noise.

In [5]:
SOUND_LENGTH = 10
bits = []
for i in range(0, round(FRAMERATE * SOUND_LENGTH)):
    bit = random.randint(-32767, 32767)
    packed_bit = struct.pack('h', round(bit))
    bits.append(packed_bit)
    bits.append(packed_bit)
bits = (b''.join(bits))

Finally, we write these bytes into the file, and close the file to complete the process.

In [6]:
wav_file.writeframes(bits)
wav_file.close()

The code provided so far should have generated a file called "myNoise.wav" inside the directory containing this notebook.

*Warning!* You might want to turn the volume down :)
[/output/myNoise.wav](/notebooks/output/myNoise.wav)



The following is the entire script for this section:
```
import wave, random, struct

FRAMERATE = 44100
SOUND_LENGTH = 10

wav_file = wave.open('output/myNoise.wav', 'w')
wav_file.setparams((2, 2, FRAMERATE, 0, 'NONE', 'not compressed'))

bits = []
for i in range(0, round(FRAMERATE * SOUND_LENGTH)):
    bit = random.randint(-32767, 32767)
    packed_bit = struct.pack('h', round(bit))
    bits.append(packed_bit)
    bits.append(packed_bit)
bits = (b''.join(bits))
wav_file.writeframes(bits)
wav_file.close()
```

### 2. Understanding the data format

The general process for generating a wave file in python, as seen in Part 1, is thus:
    
    1. Create a wave file
    2. Build some sound data
    3. Write the data to the wave file

Step 1 and 3 are pretty straight-forward. In fact, we can effectively take care of Step 1 and 3 by using the python *with* keyword thus:

In [7]:
import wave, random, struct

FRAMERATE = 44100
SOUND_LENGTH = 10

with wave.open('output/myNoise.wav', 'w') as wav_file:
    wav_file.setparams((2, 2, FRAMERATE, 0, 'NONE', 'not compressed'))

    bits = []
    for i in range(0, round(FRAMERATE * SOUND_LENGTH)):
        bit = random.randint(-32767, 32767)
        packed_bit = struct.pack('h', round(bit))
        bits.append(packed_bit)
        bits.append(packed_bit)
    bits = (b''.join(bits))
    wav_file.writeframes(bits)

Now that Step 1 & Step 3 are taken care of, we can take a closer look at Step 2 where the data is built. Specifically the code:

```
bits = []
for i in range(0, round(FRAMERATE * SOUND_LENGTH)):
    bit = random.randint(-32767, 32767)
    packed_bit = struct.pack('h', round(bit))
    bits.append(packed_bit)
    bits.append(packed_bit)
bits = (b''.join(bits))
```

In the first line, we declare an empty list named *bits*.
```
bits = []
```

Then we populate the list with values by iterating through a *for* loop.
We go through the for loop frame by frame, for SOUND_LENGTH=10 seconds worth of frames. Since 1 second contains FRAMERATE=44100 frames, we will need to iterate (FRAMERATE * SOUND_LENGTH)=441000 times.
```
for i in range(0, round(FRAMERATE * SOUND_LENGTH)):
```

At each frame, we generate a random value between -32767 and 32767. This value corresponds to the "amplitude" of the sound wave. The value is represented as a 16-bit signed integer (*short* in C, sampwidth=2), hence the range -32767 ~ 32767. (15 bits to represent 2^15=32768 values from 0 to 32767, 1 bit for +/- sign; 2^16 - 1 = 65535 values can be used).
```
    bit = random.randint(-32767, 32767)
```

The next three lines perform the appropriate tasks to pack the values into bytes, as the method [*writeframes(data)*](https://docs.python.org/3/library/wave.html#wave.Wave_write.writeframes) on the wave file instance requires that the data is provided in bytes.
```
    packed_bit = struct.pack('h', round(bit))
    bits.append(packed_bit)
    bits.append(packed_bit)
```
We use the [*struct.pack(fmt, v1, v2, ...)*](https://docs.python.org/3.0/library/struct.html#struct.pack) method to convert the integer value *bit* into bytes. Since we want it to be represented as a 16-bit signed integer (*short* in C), we use the format 'h'.
Then we append it to *bits* list twice, since we set the nchannels=2 for stereo sound.

Finally, we convert the *bits* list into a single bytestring.
```
bits = (b''.join(bits))
```

The *bits* bytestring can now be written to the wave file by providing it as the argument into the method [*writeframes(data)*](https://docs.python.org/3/library/wave.html#wave.Wave_write.writeframes).

To summarize what we are doing here, we are basically creating an array that looks like this:
[ 10, 10, 20, 20, 30, 30, 40, 40, 50, 50, 60, 60, ... ]
Where 10, 10 are the amplitude of the sound at frame 1,
20, 20 are the amplitude at frame 2, 30, 30 are the amplitude at frame 3, and so on.

Once we generate the data as a list of numbers, we simply convert it to a bytestring and save it as a wave file.

### 3. Physics of sound

Now that we've seen how sound data can be represented as a sequence of numbers between some minimum and maximum value, let us think about what sound is in the physical sence.

Sound is a mechanical phenomenon that can be produced by anything that has a momentum (mass * velocity), and it requires a "medium" in which the "wave" can travel. Here, medium is just another intermediate stuff with mass; anything other than vacuum will be able to carry sound waves. This mechanical movement carries a specific significance to the observer (the entity capable of sensing the motion) because certain patterns in the motion are recognized and abstracted to concepts such as pitch and volume. If the sound lacks a recognizable pattern, the motion is recognized as "noise". But if the sound has a regular pattern, that pattern is recognized and is given a meaning.

For example, if we pick the G3 string on the guitar, the string will move up and down 196 times per second. This moving string will push the surrounding air and this motion will propagate through the air particles. Because a guitar string is fixed at both ends and the motion is very regular, the air and the body of the guitar will vibrate regularly as well. The air pushes our eardrums in and out, the eardrums pick up this up and down movement of the air pressure, our brains abstract this movement as sinusoidal wave at 196 Hz, and then we say, "ah, the G note." Similarly, if we pick the D3 string, the string will oscillate 147 times per second. The motion is slower, and we hear the lower D note. Thus, the frequency of an oscillatory motion is interpreted as "pitch".
* On a side note, the fact that a sound wave with twice the frequency of another sound wave is an [octave](https://en.wikipedia.org/wiki/Octave) higher in pitch is purely sublime. A3 = 220Hz, A4 = 440Hz. It reveals this fantastically mathematical property of music. [(Read more about intervals)](https://en.wikipedia.org/wiki/Interval_(music))

If we picked the same string harder, the air will move at the same frequency but the string will move further up and down. Each air particle will be pushed and pulled a greater distance as it vibrates back and forth, and every time the air pushes the eardrum, it will push it with more force. The brain recognizes this increase in intensity, or pressure, and we perceive that the note is "louder".

To illustrate how sound can be represented as a sequence of numbers, let's take a look at the representation of a middle A note (440Hz).
For the sake of simplicity, we will assign an arbitrary unit "p" for the amplitude of the wave.

In one second, an air particle in our ear will move back and forth (or up and down in terms of pressure) periodically 440 times. So every oscillation takes $ 1/440 sec = 0.0022727 sec $.

<img src="assets/wave_1.png"/>

The function of the curve is:

$$ p(t) = \sin(\frac{t \times 2 \pi}{period}) = \sin(t \times 2 \pi \times frequency) = \sin(t \times 2 \pi \times 440) $$

This is a continuous representation of the sound wave. But at every t, there is some number $p(t)$. If we were to represent sound as a sequence of numbers, we just pick certain points (sequence of $t$'s) on the curve and assess the $p(t)$ at those points.

In generating a wave file, that is exactly what we are doing; we are sampling the sound at each "frame". The FRAMERATE parameter that we set in Part 1 shows that we are taking 44100 snapshots per second.

<img src="assets/wave_2.png">
The blue numbers on the curve are the numbers that we append to the *bits* list. Notice how the zeroes of the curve are not necessarily on an integer nth frame; first oscillation ends on 44100/440 th frame, which is not an integer. Here we can observe the discrete nature of digital sound.

The function of the curve is:

$$ p(i) = \sin(i \times 2 \pi \times \frac{440}{44100}) $$

Notice how we are just changing the function from

$$ \sin(t \times 2 \pi \times 440) \to \sin(i \times 2 \pi \times \frac{440}{44100}) $$

where we have just divided the argument inside the $\sin$ function by 44100; the domain of the function changes from $T$ to $I$, and since $t(i) = \frac{i}{44100}$, $p(t \to i) = \sin(t \times 2 \pi \times 440) = \sin( \frac{i}{44100} \times 2 \pi \times 440 ) = \sin(i \times 2 \pi \times \frac{440}{44100}) $.

### 4. Representing a note

For generating 10 seconds of white noise, we built a list of 441,000 (10 seconds times 44100 frames/sec) random integers between -32767 and 32767.
To represent a note, we will need to build the list using the sine function seen in Part 3:
$$ p(i) = \sin(i \times 2 \pi \times \frac{440}{44100}) $$

Let us go ahead and do that.
First we set up the wave file and declare the constants to be used throughout the script. In this example we will build 5 seconds of the middle A (440 Hz) note.

In [8]:
#Python 3
import wave, random, struct, math

FRAMERATE = 44100
SOUND_LENGTH = 5

wav_file = wave.open('output/myNote.wav', 'w')
wav_file.setparams((2, 2, FRAMERATE, 0, 'NONE', 'not compressed'))

Middle A has a frequency of $440 Hz$ and period of $ \frac{1}{440} sec $, or wavelength of $ \frac{1}{440} sec \times 343 m/sec = 0.7795 m$ (in air, at room temperature and atmospheric pressure) , so the "period" in frames is equal to $ \frac{1}{440} sec \times 44100 frames/sec = 100.227 frames $.

Next, we build the sound data using the sine function.

In [9]:
PERIOD = FRAMERATE / 440
bits = []
for i in range(0, round(FRAMERATE * SOUND_LENGTH)):
    bit = 5000 * math.sin( i * (2 * math.pi / PERIOD) )
    packed_bit = struct.pack('h', round(bit))
    bits.append(packed_bit)
    bits.append(packed_bit)
bits = (b''.join(bits))
wav_file.writeframes(bits)
wav_file.close()

The only line that changed from the code for white noise is the line:
```
    bit = random.randint(-32767, 32767)
```
which changed to:
```
    bit = 5000 * math.sin( i * (2 * math.pi / PERIOD) )
```
in the code for generating middle A. Here we are using a coefficient of 5000 for the sine function so that the amplitude is significantly large within the range (-32767, 32767).

The following is the entire script for generating 5 seconds of Middle A
```
#Python 3
import wave, random, struct, math

FRAMERATE = 44100
SOUND_LENGTH = 5
PERIOD = FRAMERATE / 440

with wave.open('output/myNote.wav', 'w') as wav_file:
    wav_file.setparams((2, 2, FRAMERATE, 0, 'NONE', 'not compressed'))

    bits = []
    for i in range(0, round(FRAMERATE * SOUND_LENGTH)):
        bit = 5000 * math.sin( i * (2 * math.pi / PERIOD) )
        packed_bit = struct.pack('h', round(bit))
        bits.append(packed_bit)
        bits.append(packed_bit)
    bits = (b''.join(bits))
    wav_file.writeframes(bits)
```

Here is the output of the script: [/output/myNote.wav](/notebooks/output/myNote.wav)

### 5. Representing a melody

To create a melody, we are going to abstract the process of creating a note into a *class* so that it can be easily reproduced. A Note class will have a method to easily generate a list of values, and we can append these lists together to create a melody. The class won't do any of the file processing; we will only need to save the file once when the entire list representing the melody is ready.

In [10]:
FRAMERATE = 44100

class Note():
    def __init__(self, frequency=440, length=1):
        self.frequency = frequency
        self.length = length
        self.period = FRAMERATE / self.frequency
    
    def toBytes(self):
        bits = []
        for i in range(0, round(self.length * FRAMERATE)):
            bit = 5000 * math.sin( i * (2 * math.pi / self.period) )
            packed_bit = struct.pack('h', round(bit))
            bits.append(packed_bit)
            bits.append(packed_bit)
        bits = (b''.join(bits))
        return bits

The __init__ method of the Note class takes in two arguments: frequency (in Hz) and length (in sec).
The toBytes method just outputs a list of numbers representing that note.

We further abstract the process of creating multiple notes into a Sequence class. This Sequence class will have the method to write the wave file.

In [11]:
class Sequence():
    def __init__(self):
        self.notes = []
    
    def add(self, note):
        if type(note) == Note:
            self.notes.append(note)
        else:
            raise TypeError('Should be a Note object')
    
    def addNote(self, *args, **kwargs):
        self.notes.append(Note(*args, **kwargs))
    
    def writeWav(self, filename):
        wav_file = wave.open(filename, 'w')
        wav_file.setparams((2, 2, FRAMERATE, 0, 'NONE', 'not compressed'))
        bits = b''
        for note in self.notes:
            bits += note.toBytes()
        wav_file.writeframes(bits)
        wav_file.close()

The Sequence class has two methods for adding notes:
The *add* method takes in a Note object and appends it to a list named *notes*.
The *addNote* method takes in two arguments - frequency and length - that are used to construct a Note instance before appending to the *notes* list.
The *writeWav* method takes in a string argument - the file path - and writes a wave file. It does this by iterating through the list of Note objects in the *notes* list and calls the *toBytes* method to generate the sequence of numbers.

Now we will use the two classes - Note and Sequence - that we've just written to generate a very simple melody: Do - Re- Mi

Do : C4 : 261.626 Hz
Re : D4 : 293.665 Hz
Mi : E4 : 329.628 Hz

In [12]:
doremi = Sequence()
doremi.addNote(261.626, 1)
doremi.addNote(293.665, 1)
doremi.addNote(329.628, 1)
doremi.writeWav('output/myMelody.wav')

Check out the melody: [/output/myMelody.wav](/notebooks/output/myMelody.wav)

### 6. Writing a simple song and saving as a WAV

Now, just for fun, let's write the infamous "Twinkle Twinkle Little Star" tune.
I've put modified versions of the Note and Sequence classes into a module called sound.py (included in this notebook directory), and wrote a helper function to map human-friendly musical notation like "C" or "G" to a frequency.

In [13]:
import sound

BPM = 120
b = 60 / BPM

twinkle = sound.Sequence()
twinkle.addNote('C4', b)
twinkle.addNote('C4', b)
twinkle.addNote('G4', b)
twinkle.addNote('G4', b)
twinkle.addNote('A5', b)
twinkle.addNote('A5', b)
twinkle.addNote('G4', b * 2)
twinkle.addNote('F4', b)
twinkle.addNote('F4', b)
twinkle.addNote('E4', b)
twinkle.addNote('E4', b)
twinkle.addNote('D4', b)
twinkle.addNote('D4', b)
twinkle.addNote('C4', b * 2)
twinkle.writeWav('output/twinkle')

Check out [Twinkle Twinkle](/notebooks/output/twinkle.wav)