# Short Time Speech Processing
Here we demonstrate energy computation and zero crossing rate computation. We use a recording of the word 'sita'. 

In [None]:
import librosa
import matplotlib.pyplot as plt
import numpy as np

signal, sampling_rate = librosa.load('sita.wav', sr=None)

%matplotlib inline
plt.plot(np.arange(len(signal)) / sampling_rate, signal)
plt.xlabel(r'Time(s)')

The signal is processed block by block. With each block multiplied by a window function. A common window is the Hamming window

In [None]:
window = librosa.filters.get_window('hann', Nx=512)
window2 = librosa.filters.get_window('hamm', Nx=512)
plt.figure()
plt.plot(window)
plt.plot(window2)
plt.legend(['Hanning', 'Hamming'])
plt.ylabel(r'$w[n]$')
plt.xlabel(r'$n$')

In [None]:
512 * (1 / 16000)

Let us show the processing of a section of this speech signal using the Hamming window


In [None]:
block_size = 512
shift_size = 256
num_blocks = 10

short_segment = signal[sampling_rate: sampling_rate + num_blocks * block_size]

plt.plot(np.arange(len(short_segment)) / sampling_rate, short_segment)

for i in range(2 * num_blocks -1):
    plt.plot((np.arange(len(window)) + i * shift_size) / sampling_rate, window)
plt.xlim([0,.1])

We will use `librosa` to compute the energy and zero crossing rate of the signal.

In [None]:
rmse = librosa.feature.rms(y=signal, frame_length=512, hop_length=256)
zcr = librosa.feature.zero_crossing_rate(y=signal, frame_length=512, hop_length=256)

In [None]:
plt.figure()
plt.plot(np.arange(len(signal)) / sampling_rate, signal)
plt.plot(np.arange(len(signal))[::256] / sampling_rate, rmse.T, 'r')
plt.plot(np.arange(len(signal))[::256] / sampling_rate, zcr.T, 'g')
plt.xlabel(r'Time(s)')
plt.legend(['Signal', 'Energy', 'ZCR'])
plt.xlim([1, 1.5])