In [1]:
import audio_processing as ap
import audio_utils as au
import math
import numpy.fft as fft
import matplotlib.pyplot as plt
%matplotlib inline

# CS 328 Final Project
## Thomas Bertschinger and Sanders McMillan
### Lab Notebook

### Tuesday, April 26, 2016 **(joint entry)**

Our final project will (likely) be *model* focused rather than experiment focused. We will "implemenet computational model(s) and compare results to existing human data." In this case, the human data will be musical compositions created by humans (*e.g.* Bach fugues) and our model will be a system that "composes" music (perhaps learning from human compositions). 

We set up a bibliography using Latex and BibTex, and a git repository to keep track of our code and materials. 

https://github.com/bertschingert/cs328-final/tree/master/references

### Wednesday, April 27, 2016 (joint entry)

We are thinking about doing a signal-processing based model that creates a representation of a signal and can classify the signal into categories such as instrument family, voice, etc. We will need to create a representation of audio sound in a (hopefully) small number of dimensions that includes enough relevant information so that we can classify a sound into a category such as brass versus string instrument, for example. 

The representation will likely include attributes such as attack and decay time of the waveform; the spectrum at various points in time; how much the spectrum changes over time; irregularities in the spectrum. 

Our model will ideally be able to classify instruments correctly at different amplitudes and pitches. It would also be important to limit the model to audio information that humans can actually percieve. (For example, it wouldn't make sense for our model to take into account frequences substantially above 20,000Hz because humans cannot hear that high.) 

Being able to pull out the relevant information from an audio signal is important because it will help us understand how humans can do things such as distinguish different people's voices. We know probably hundreds of different voices that we can identify from only a few words of speech. This is also important for being able to recognize different instruments present in a single audio signal. 

We think it would be plausible to create a neural network to identify different audio signals. We will first take an audio signal and use tools such as the discrete Fourier transform to create a suitable representation of the signal that omits extraneous information or information that humans cannot perciever. Then, we train a neural network to be able to identify, from the features of the representation, what category the signal belongs to. 

Theoretically, our model will be grounded in the similarity models learned earlier in this course, in addition to Gibsonian and Gestalt principles of perceiving invariants in stimuli and grouping similar and proximal stimuli as belonging to the same perceptual unit (e.g. being able to classify a signal as being of a certain category regardless of it's amplitude and signal, and classifying/perceiving similar successive waveforms as being from the same instrument). It will also involve the place and time theories for how humans transform air pressure hitting the ear into an auditory representation (which is where our Fourier transform and representation stages come in). 

### Monday, May 16, 2016 (joint entry)

We have written some code to do basic audio processing (FFT). We are also starting to create the NN..

In order to keep the neural network consistent, we will have the option to save weights and biases to a text file so that they can be loaded. 

In [2]:
left, right = au.read_raw_stereo("audio_files/violin-a440.raw")
chunk = left[:44100]
freqs = fft.rfft(chunk)
au.graph_fft(freqs[:10000])

FileNotFoundError: [Errno 2] No such file or directory: 'audio_files/violin-a440.raw'

We also started compiling our audio sample library. The audio samples are 1.5 second long clips downloaded from http://www.philharmonia.co.uk/explore/make_music. So far we have 5 different instruments (guitar, saxophone, flute, violin, and trumpet), each at three different pitches (A, C, and E). The octaves of the pitches are different for each instrument as the library did not contain all octaves for each instrument, and each instrument has different pitch restraints.

### Tuesday, May 17, 2016 (joint entry)

We are starting to write code that uses the FFT to get some information on the signal, such as attack time. This will be helpful for our representation. We are also starting to implement the neural network. 

### Saturday, May 21, 2016

Created the Hann window function. 

In [None]:
test_signal = []
for i in range(100):
    test_signal.append(1)
w = ap.hann_window(test_signal)
au.graph_signal(w)

Created code to compute the spectral centroid

In [None]:
wave = au.create_sine_wave(440, 1, 1)
spectrum = fft.rfft(wave)
print("before window: spectral centroid is ", ap.get_spectral_centroid(spectrum, 44100))
au.graph_fft(spectrum[:500])

In [None]:
w_signal = ap.hann_window(wave)
w_spectrum = fft.rfft(w_signal)
print("after window: spectral centroid is ", ap.get_spectral_centroid(w_spectrum, 44100))
au.graph_fft(w_spectrum[:500])

### Sunday, May 22, 2016

Now using the python wave library to read .wav files. Created the function read_wav_mono in audio_utils.py which returns a list of the samples. 

In [None]:
f = au.read_wav_mono('audio_files/guitar_A4_very-long_forte_normal.wav')
au.graph_signal(f)

In [None]:
step= int(44100 / 4)
start = 44100
chunk = f[start:start+step]
s = fft.rfft(chunk)
au.graph_fft(s, 1000)

In [None]:
w = ap.hann_window(chunk)
s = fft.rfft(w)
au.graph_fft(s, 1000)

In [None]:
s = ap.spectral_flux(f)
au.graph_signal(s)

In [None]:
f2 = au.read_wav_mono('audio_files/saxophone_A4_15_forte_normal.wav')
au.graph_signal(f2)

In [None]:
s2 = ap.spectral_flux(f2)
au.graph_signal(s2)
for i in s2:
    print(i)

### Thursday, May 26

We finished writing code for our neural network, and are now testing and de-bugging it. Our neural network has an initialize_network function that initializes the weights and nodes of the networks based on input length, the output length, the number of hidden units, and the number of layers of the network. It also has a set_hidden_units function that allows you to set the number of hidden units at any particular hidden layer, and changes the weights accordingly.

In [1]:
# Testing the initialize network function
import neural_net as nn
import numpy as np
nn.initialize_network(3, 3, 3, 5)

[array([[ 0.75551963,  0.17679734,  0.69249921],
        [ 0.22099658,  0.00146734,  0.51790847],
        [ 0.9460001 ,  0.28665973,  0.20116325],
        [ 0.54648863,  0.76805964,  0.36519626],
        [ 0.86281637,  0.92525577,  0.50456145]]),
 array([[ 0.36857105,  0.39772708,  0.56079839,  0.95727679,  0.05452356],
        [ 0.94444942,  0.15472501,  0.75261986,  0.32684863,  0.04950202],
        [ 0.06437716,  0.82002242,  0.12235303,  0.53283632,  0.25162644],
        [ 0.11549916,  0.91078775,  0.04357679,  0.03492126,  0.20962077],
        [ 0.84396591,  0.15039314,  0.1281514 ,  0.87396488,  0.24372387]]),
 array([[ 0.0079272 ,  0.85914443,  0.19722236,  0.33233457,  0.88069444],
        [ 0.01260716,  0.69935568,  0.49910774,  0.98161941,  0.26365055],
        [ 0.92105016,  0.76617951,  0.95917952,  0.4227643 ,  0.73658802]])]

In [2]:
#Testing the set hidden units function
nn.set_hidden_units(1, 2)

[array([[ 0.82380233,  0.10330766,  0.86887   ],
        [ 0.11343192,  0.77067338,  0.78214322]]),
 array([[ 0.86098025,  0.07841429],
        [ 0.52739794,  0.04340229],
        [ 0.02358108,  0.30241393],
        [ 0.72622984,  0.4097042 ],
        [ 0.22033241,  0.29004932]]),
 array([[ 0.0079272 ,  0.85914443,  0.19722236,  0.33233457,  0.88069444],
        [ 0.01260716,  0.69935568,  0.49910774,  0.98161941,  0.26365055],
        [ 0.92105016,  0.76617951,  0.95917952,  0.4227643 ,  0.73658802]])]

In [3]:
#Testing the update weights function
input = np.array([[1,2,3], [3, 2, 1], [1,1,1]])
output = np.array([[1, 1, 1], [2,2,2], [3,3,3]])
np.array(output)
nn.update_weights(input, output, 0.01)
print(nn.get_weights())

[array([[ 0.82387289,  0.10341644,  0.86891483],
       [ 0.11349855,  0.77071865,  0.7821712 ]]), array([[ 0.86142329,  0.07885402],
       [ 0.52846751,  0.04446379],
       [ 0.02435383,  0.30318071],
       [ 0.72686696,  0.41033651],
       [ 0.221083  ,  0.29079416]]), array([[ 0.00838452,  0.85956339,  0.19769073,  0.33284901,  0.88109499],
       [ 0.01541263,  0.70192588,  0.50198107,  0.98477537,  0.26610777],
       [ 0.92298001,  0.76794752,  0.96115606,  0.42493525,  0.73827832]])]


In [4]:
(weights, bias) = nn.train_network(input, output, nn.update_weights, 10, 0.01)
print(weights)

Iteration:  0
[[ 0.00838452  0.85956339  0.19769073  0.33284901  0.88109499]
 [ 0.01541263  0.70192588  0.50198107  0.98477537  0.26610777]
 [ 0.92298001  0.76794752  0.96115606  0.42493525  0.73827832]]
Iteration:  1
[[ 0.00883954  0.85998053  0.19815686  0.33336086  0.8814937 ]
 [ 0.01818339  0.70446596  0.50481943  0.98789219  0.26853565]
 [ 0.92489029  0.76969876  0.96311294  0.42708411  0.7399522 ]]
Iteration:  2
[[ 0.0092923   0.86039586  0.19862075  0.33387015  0.8818906 ]
 [ 0.02092018  0.70697653  0.50762353  0.99097068  0.27093478]
 [ 0.92678135  0.77143351  0.96505052  0.42921129  0.74160995]]
Iteration:  3
[[ 0.00974281  0.8608094   0.19908242  0.33437689  0.88228569]
 [ 0.02362371  0.70945817  0.51039408  0.99401164  0.27330575]
 [ 0.92865355  0.77315205  0.96696914  0.43131716  0.74325186]]
Iteration:  4
[[ 0.0101911   0.86122116  0.19954191  0.33488111  0.882679  ]
 [ 0.02629466  0.71191145  0.51313175  0.99701583  0.27564913]
 [ 0.93050724  0.77485467  0.96886914  0.433

In [5]:
#Testing predict network function
inp = np.array([[4,4,4]])
print(inp.shape)
print(nn.predict_network(inp.T, nn.logistic))

(1, 3)
[array([[4],
       [4],
       [4]]), array([[ 0.99971575],
       [ 0.99947081]]), array([[ 0.75763062],
       [ 0.6976289 ],
       [ 0.77438655],
       [ 0.85078559],
       [ 0.66584234]]), array([[ 0.85065377],
       [ 0.89018198],
       [ 0.96123967]])]
[[ 0.85065377]
 [ 0.89018198]
 [ 0.96123967]]


In [9]:
data = [np.loadtxt("reps/clarinet_zcr.txt"), np.loadtxt("reps/flute_zcr.txt"), np.loadtxt("reps/guitar_zcr.txt"), np.loadtxt("reps/saxophone_zcr.txt"), np.loadtxt("reps/violin_zcr.txt")]
# f_data = 
# g_data = 
# s_data = 
# v_data = 
inp = []
inputs = []
outputs = [[],[],[],[],[]]
for i in range(50):
    for j in range(5):
        inputs.append(data[j][i])
        outputs[j].append(1)
        for k in range(5):
            if k != j:
                outputs[k].append(0)
print(inputs)
inp = [inputs]
print(inp)
inp = np.array(inp)
output = np.array(outputs)
print(output)

nn.initialize_network(1, 5, 3, 5)
nn.train_network(inp, output, nn.update_weights, 100, 0.1)
print(nn.predict_network([inp[0]], nn.logistic))

[0.041541950113378683, 0.027210884353741496, 0.0050793650793650794, 0.021133786848072562, 0.021043083900226758, 0.054965986394557825, 0.018684807256235829, 0.011791383219954649, 0.027936507936507936, 0.024399092970521542, 0.023945578231292518, 0.013242630385487529, 0.012426303854875283, 0.06140589569160998, 0.021405895691609979, 0.026485260770975058, 0.016145124716553289, 0.0025396825396825397, 0.023764172335600908, 0.022947845804988661, 0.038367346938775512, 0.041179138321995462, 0.014149659863945578, 0.023219954648526078, 0.018321995464852608, 0.059954648526077098, 0.020861678004535148, 0.0069841269841269841, 0.024489795918367346, 0.019047619047619049, 0.025759637188208617, 0.023219954648526078, 0.0098866213151927434, 0.028843537414965988, 0.021496598639455782, 0.024399092970521542, 0.018049886621315191, 0.0048979591836734691, 0.066575963718820866, 0.025941043083900227, 0.039365079365079367, 0.015056689342403628, 0.020408163265306121, 0.027664399092970523, 0.021950113378684806, 0.042

TypeError: predict_network() missing 1 required positional argument: 'weight_function'