This lab is an introduction to audio and image processing. You will be learning how to use some Python packages that are commonly used in these domains. Part 1 deals will audio, and part 2 will be on images.

Name: 

NetID:

# Part 1 - Loading and Visualizing Digital Audio

In [None]:
from scipy.io import wavfile
from matplotlib import pyplot as plt
import numpy as np

%matplotlib inline

# Plotly initialization

Sound is minute pressure changes in the medium it it travelling in, this pressure change is measured by a microphone and converted into signal levels. The most direct way to visualize this captured information is to plot out these values directly.

In [None]:
Fs, wav = wavfile.read('data/italian.wav')

Plotting the audio signal

In [None]:
from IPython.display import Audio

In [None]:
Audio('data/italian.wav')

In [None]:
fig = plt.figure(figsize=(16,5))

plt.subplot(1, 2, 1)
plt.plot(wav);
plt.title('Italian speech')

## Exercise 1 - Baby steps

1. Write a function to compute the length of the audio file in seconds
2. Write a function to plot out short section of the audio clip instead of the whole length

In [None]:
def audioLength(signal, samplingRate):
    #signal is a single channel of the audio file, samplingRate is the, well, the sampling rate of the signal
    #If you are lost here, check the documentation for scipy.io.wavefile :)
    #your code here

    return duration

def getWindow(signal, start, end, windowFunc=None):
    #ignore windowFunc for now
    #signal is a single channel of your audio file
    #The function should return the signal values from [start,end), the value indexed by end is excluded
    
    #your code here

    return section


print audioLength(wav, Fs)    
plt.plot(getWindow(wav, 100000, 200000))

## Better Visualization

It is difficult to see what's happening in the audio signal from the plots above. To analyze audio content, in applications such as speaker recognition or audio content identification, a necessary tool is the **spectrogram**. The spectrogram can be used to visualize the frequency content of the audio signal as it progresses over time.

Mathematically, the spetrogram is the **squared-magnitude** of 
the Fourier transform of overlapping segments, or windows, of the audio signal.
To generate the spectrogram, the signal must first be separated into
overlapping segments. If we denote the signal as 
$\vec{x} = [x_{0}, x_{1},..., x_{N-1}]$, a one-dimensional vector of $N$ samples,
Then the segments would be given as
$$ 
\vec{x}_{0}=[x_{0}, x_{1},..., x_{N}],\\
\vec{x}_{1}=[x_{M}, x_{M+1},..., x_{M+N}],\\
\vdots\\
\vec{x}_{i}=[x_{iM}, x_{iM + 1},..., x_{iM+N}],
$$
where $M$ is the step size between windows
and $N$ is the length of each window. To generate a smoother spectrogram,
it is common to multiply the windows element-wise with a 
*windowing filter* $\vec{w}$. A popular choice of a window
filter is the Hamming window.

### Windowing Functions

Using hard cut-offs at the boundaries of the windows can cause various undesirable artifacts. In order to reduce these effects, windowing functions can be applied to these rectangular clips. Numpy provides `hamming()` to generate what is known as the Hamming window. We apply this window to the signal we obtained above by performing an elementwise multiplication.

Note:
Not multiplying the signal by any fancy windowing function is sometimes called the rectangular window.

In [None]:
N = 200000 - 100000
w = np.hamming(N)      # generate a Hamming window of length N
s = wav[100000:200000] # how many samples does s have? N or N+1?

plt.figure()
plt.plot(w) # plot the window
plt.title('Hamming Window')
ax = plt.axes()
ax.set_xlabel('time (samples)')
ax.set_ylabel('amplitude')

### Exercise 2

Now modify the definition of your `getWindow()` function, it should now apply the hamming window to the signal that was obtained.

In [None]:
def getWindow(signal, start, end):
    #signal is a single channel of your audio file
    #The function should return the signal values from [start,end), the value indexed by end is excluded
    #windowFunc should be a function that will generate a window function, here we will just pass in np.hamming
    #your code here

    return section

plt.plot(getWindow(wav, 100000, 200000))

### Fourier Transform

Applying a Fourier Transform to a signal allows us to view it's frequency content.

To generate the frequency content for the spectrogram, the **Fourier transform** is applied
to the windowed segments of the input and the magnitude of the result is squared
and stored,

$$\vec{f}_{i} = \left\|\mathcal{FFT}\left(\vec{w}\odot \vec{x}_{i}\right)\right\|^{2},$$

where $\odot$ represents elementwise multiplication. Note that the Fourier transform
produces both negative and positive frequencies, but the content of the negative frequencies are
redundant, since the spectrogram stores the *magnitude* of the FT result and we are dealing
with *real* signals. Therefore, only $\vec{f}_{i,[0:N/2 + 1]}$ is needed. The function **rfft()** takes
care of this for you.

In [None]:
f = np.fft.rfft(s)                   # Fourier transform of signal, keeping only the positive frequencies

freq = np.arange(f.size)*(Fs/2.)/f.size    # generate frequencies for plot

fig_fft = plt.figure()
plt.plot(freq, np.absolute(f))
plt.title('Magnitude of Fourier Transform of Signal')
ax = plt.axes()
ax.set_xlabel('frequency')
ax.set_ylabel('magnitude')
ax.set_xlim(0,22050)

### Exercise 3

a.  What's the length of `f` in the code segment above?

 b. Plot only the first [0,1250] frequency of the above Fourier Transform.

c. Plot the frequency content of a rectangular window and a Hamming window. Give a brief description (One sentence) of the differences between these windows. 

d. Plot the FFT of a window of the signal using the rectangular window and one with a hamming window.  For the plots, you might want to play with the scales on the axis to see better

### Frequency Domain Visualization

Here we will utilize a built in function in matplotlib to plot the spectrogram of the audio signal. The spectrogram is computed from a overlapping sliding window of the audio signal, with the windowing function applied. This is typically called the Short Time Fourier Transform(STFT) of the audio signal. Each column in the plot represents a window of the signal, the y-axis represents the frequency and the color represents the magnitude.

In [None]:
fig, (ax1) = plt.subplots(ncols=1) # create plot
fig.set_size_inches(16, 5)

N=1024
M=128

# generate & plot spectrogram (built-in function)
data, freqs, bins, im = ax1.specgram(wav[130000:170000],Fs = Fs, NFFT=N, noverlap=(N-M), window = np.hamming(N), cmap = 'jet')   
ax1.axis('tight')
ax1.set_title('Spectrogram of Channel 0')
ax1.set_ylabel('frequency (normalized)')
ax1.set_xlabel('time (in samples)')

## Exercise 4

3. Looking at the spectrograms above, are you able to guess the number of words in this speech? Describe briefly the reasons.
4. Now listen to the provided wav file, are you able to confirm the answer? 

In [None]:
Audio(wav[130000:170000], rate=Fs)

# Part 2 - Digital Images

In this part, we will be looking at digital image representation

In [None]:
from PIL import Image

In [None]:
# Read in the image file and convert it into a numpy array
img = np.array(Image.open('data/roulettes.jpg'))

plt.imshow(img)

## Exercise 5

Please write code to answer the following questions: 

a. Print the height and width of the image.

b. Calulate the compression ratio of this JPEG image. Recall that an image is suppose to be made of pixels and each pixels is about 3 bytes.

c. Show the R,G,B, channel seperately in a row, by making the other two channels all zeros. 

d. Plot the histogram of each channel.

e. Extra credit: implement a method of segmenting the planes from the sky. Plot the segmentation result.
