<h1 align="center"> pyLDPC Tutorial: Sound </h1>

## update:  03/28/16 - v.0.7.3

<b><font color="red"> Since version 0.7: Coding and decoding functions take tG (Transposed G) instead of G, the coding matrix. Functions that construct it (CodingMatrix and CodingMatrix_systematic) return tG instead of G as well. </font></b> 
<br> 

<b><font color="blue"> Since version 0.8: Coding and decoding functions no longer require conditions on matrices' sizes. The Sound data is adapted to the the matrices. </font></b> 

<font color=#0101DF><b> Note: </b> </font> 

 <b> Github doesn't support audio in jupyter's cells, I invite you to open this notebook in <a href="http://nbviewer.jupyter.org/github/hichamjanati/pyldpc/blob/master/pyLDPC-Tutorial-Sound.ipynb?flush_cache=true"> nbviewer </a> </b> 


This notebook introduces a user's guide of pyLDPC's sub-module Sound: ldpc_sound.
If you would like to know what each function does and go into the construction details, go to <a href="http://nbviewer.jupyter.org/github/hichamjanati/pyldpc/blob/master/pyLDPC-Sound-Construction.ipynb?flush_cache=true/"> LDPC-Sound Construction Details</a>

First, before anyone sets high expectations, keep in mind that most audio files have a sampling rate of <b>44100 Hz.</b> Which means that decoding only one second of a track is the same as decoding an array of 44100 numbers. Each number has an int16 format, which (as you may have seen in the link above) is transformed to a 17 bits binary array. And that's <u><b>a lot</b></u> to code and decode !

<b><font color=#A44057> On a 1.5Ghz i5 processor - 4Go RAM, coding plus 1 iteration of decoding of a 5 seconds-track last for about 20 minutes ! </font></b>

If you want to know why aren't we using low sampling frequencies and make everybody happy, <a href="https://en.wikipedia.org/wiki/44,100_Hz" > this </a> might give you a satisfying answer. (This is a result of <i>Shannon's theroem</i> and I'm not really an expert of the subject ..)

In this tutorial, I'll be using <b><u> wav</u></b> audio files because they're easy to read as numpy arrays with <b><u>scipy</u></b>. 

In [3]:
import numpy as np
import pyldpc
from pyldpc import ldpc_sound
from scipy.sparse import csr_matrix
from time import time
from scipy.io import wavfile
import IPython

If I had to sum up Sound Coding and decoding in a few words, I'd say:

1. Make coding and decoding matrices G and H.
2. Read the wav file 1-D array. (If Stereo, take one of the channels)
3. Binarize the audio array [Using functions we'll see later].
4. Apply Coding function to binary array.
5. Apply Decoding function to Coded array. 


# Step 1: Construct a code

To construct a code you need to choose a the triplet (n,d_v,d_c). 
Keep in mind that: 

- the rate of the code k/n is approximately equal to 1 - d_v/d_c.****
- if high values of k,n are used, transform H and G to csr format using scipy.sparse.csr_matrix(): It's a compressed format with which calculations are faster. 


* *Details about how to choose matrices parameters <a href="http://nbviewer.jupyter.org/github/hichamjanati/pyldpc/blob/master/pyLDPC-Tutorial-Matrices.ipynb?flush_cache=true/">here </a>, but I recommand going through this tutorial and using the matrices defined hereby (or any other matrices) for now, and then optimize your coding & decoding by changing the matrices.*



In [4]:
n = 1000
d_c = 4
d_v = 3 #In this case k is approximately equal to n/4 (n x (1-3/4))

H = pyldpc.RegularH(n,d_v,d_c)
H,tG = pyldpc.CodingMatrix_systematic(H)

tGs = csr_matrix(tG)
Hs = csr_matrix(H)

k = tG.shape[1]
k

252

# Step 2: Read and binarize the audio file


Let's code/decode a 5 seconds piano track. 


In [6]:
IPython.display.Audio("Sound/piano/piano.wav")


As mentioned above, coding this file and one iteration of decoding took 20-30mn. If you'd like to impress your friends and collegues by decoding a noisy version of the latest album of Eminem, you would probably have to run the algorithm for <b> days </b>. That's simply because most audio files are recorded with a sampling rate (or frequency if you prefer) of 44.1 kHz. Which means that if you read a wav file with the function <i> wavfile.read() </i> you will get the tuple <i> frequency, array </i>.

In [3]:
freq, array = wavfile.read("Sound/piano/piano.wav")
print("Frequency: {} samples/sec".format(freq))
print("\nPiano Array:\n",array)
print("\nPiano's shape:",array.shape)
print("\nPiano's type:",array.dtype)

Frequency: 44100 samples/sec

Piano Array:
 [[   0    0]
 [   0    0]
 [   9  766]
 ..., 
 [2024 3144]
 [2046 3960]
 [ -29  -66]]

Piano's shape: (235200, 2)

Piano's type: int16




 The array is, as explained above, a series of int16 numbers. Some files contain multiple series of numbers (like the one above: 2 columns of 235200 int16 numbers). We call them <i> channels </i>. The additional channels are not necessary: they provide a stereo-effect. That's why, we'll code and decode only one channel (they're the same, but with a sort of delay in time, so that the sound seems to be coming from different directions).

In [4]:
piano = array[:,0]
piano.shape

(235200,)

Here's the first channel (hypothetically) for example of our piano track that lasts for 5.33 seconds: 

<img src="Equations/SoundT1.png">

<br>

Now you realize how big a very short audio file can be ! Each number is an int16 number which means we'll need 17 bits to binarize it... (remember that an int16 can be negative, so it's included in a 2^16 width segment). 


In [5]:
print("\nPiano's duration: {} seconds.".format(len(piano)/freq))


Piano's duration: 5.333333333333333 seconds.


Let's binarize it using the function:
```python
ldpc_sound.Audio2Bin (audio array) -> return (audio binary form)
```


In [6]:
piano_bin = ldpc_sound.Audio2Bin(piano)
piano_bin.shape

(235200, 17)

# Step 3: Code the binary array


```python
ldpc_sound.SoundCoding (tG transposed coding matrix, audio binary form, snr) -> return (coded audio, noisy binary audio) 
```

We choose to code it with SNR = 6 decibels (which corresponds to a standard-deviation around 0.45). 


In [7]:
snr = 6

In [16]:
t = time()
piano_coded, piano_noisy_bin = ldpc_sound.SoundCoding(tGs,piano_bin,snr)
t = time() - t

print("Coding time of {} samples: {}s\n".format(len(piano_bin), t))


Coding time of 235200 samples: 2.9040398597717285s



To read the noisy array, you need to transfrom it from binary to Audio format and save it somewhere in your HDD. 

In [17]:
piano_noisy = ldpc_sound.Bin2Audio(piano_noisy_bin)
wavfile.write("Sound/piano/test/piano_noisy.wav",freq,piano_noisy)

Here's what the noisy test sounds like:


In [7]:
IPython.display.Audio("Sound/piano/test/piano_noisy.wav")


<font color="red"> NOTE: I used the csr format of tG (tGs) in coding as recommended with large codes. Here is the difference if coding is done with standard numpy format: </font>

In [18]:
t = time()
piano_coded_tG, piano_noisy_bin_tG = ldpc_sound.SoundCoding(tG,piano_bin,snr)
t = time() - t

print("Coding time of {} samples using tG instead of csr format tGs: {}s\n".format(len(piano_bin), t))


Coding time of 235200 samples using tG instead of csr format tGs: 7.882143020629883s





Coding is almost 3 times faster with csr format ! 

# Step 4: Decoding ! 

Now decode it with the function:

```python
ldpc_sound.SoundDecoding
```

Let's try one iteration of decoding. 

In [13]:
max_iter = 1 

In [20]:
t = time()
piano_decoded_bin = ldpc_sound.SoundDecoding(tGs,Hs,piano_coded,snr,max_iter)
t = time() - t 

print("Decoding time of {} samples and {} max iterations: {}mn {}s\n".format(len(piano_decoded_bin),max_iter, t//60,int(t%60)))

Decoding time of 235200 samples and 1 max iterations: 18.0mn 14s



Compare the original to the decoded version using *BER_audio* to compute the Bit Error Rate: 

In [21]:
print("Bit Error Rate in %: ",100*ldpc_sound.BER_audio(piano_bin,piano_decoded_bin),"%")

Bit Error Rate in %:  0.0504201680672 %


To read the decoded version, change it from binary to Audio and save it.

In [22]:
piano_decoded = ldpc_sound.Bin2Audio(piano_decoded_bin)
wavfile.write("Sound/piano/test/piano_decoded.wav",freq,piano_decoded)

Here's what the decoded test sounds like:


In [8]:
IPython.display.Audio("Sound/piano/test/piano_decoded.wav")


The BER is not 0 (and you can still hear the noise). We'll need to increase max_iter for a better decoding:

# Step 5: Increase max_iter if needed

Depending on the channel model you built (SNR's value) and the quality of decoding you're hoping for, you may probably still have a little noisy sounds (BER > 0) after only one iteration of decoding. Sometimes, usually with small SNRs (less than 6), you may even need to complete hundreds of iterations to get a satisfying results. The idea here is making 2 iterations instead of one won't make a valuable difference. Personally, I increase max_iter like this :
1, 15, 50, 100, 200. However, for low values of SNR, perfect decoding may not be possible no matter how you increase the number of iterations: the noise is in this case too loud to re-establish 100% of the information.



In [23]:
max_iter = 20
t = time()
piano_decoded_bin = ldpc_sound.SoundDecoding(tGs,Hs,piano_coded,snr,max_iter)
t = time() - t 

print("Decoding time of {} samples and {} max iterations: {}mn {}s\n".format(len(piano_decoded_bin),max_iter, t//60,int(t%60)))

Decoding time of 235200 samples and 20 max iterations: 26.0mn 26s



In [24]:
print("Bit Error Rate in %: ",100*ldpc_sound.BER_audio(piano_bin,piano_decoded_bin),"%")

Bit Error Rate in %:  0.0 %


In [25]:
piano_decoded = ldpc_sound.Bin2Audio(piano_decoded_bin)
wavfile.write("Sound/piano/test/piano_decoded_20.wav",freq,piano_decoded)

<h4> Tada ! Our decoding is almost error-free ! No need to add more iterations ! </h4>

<font color=#A44057> <h2> Application to songs </h2> </font>

As I said before, you can use the same method described in this tutorial by first cutting your 3 minutes track in many (too many) 5 seconds tracks. And then concatenate the whole mess ! You will need to make sure that no file gets overwritten ! 
