# RMS and-Zero-CrossingRate
How to extract Root-Mean Square Energy (RMSE) and
Zero-Crossing Rate (ZCR) from audio data using the Python library
librosa.
I also show how RMS and ZCR vary depending on music genre and type
of audio source (i.e., voice vs noise).

https://www.youtube.com/watch?v=EycaSbIRx-0&list=PL-wATfeyAMNqIee7cH3q1bh4QJFAaeNv0&index=10



<b>Sound power</b>

● Rate at which energy is transferred
● Energy per unit of time emitted by a sound source in all directions
● Measured in watt (W)

<b>Sound intensity</b>

● Sound power per unit area
● Measured in W/m2

<b>Intensity level</b>

● Logarithmic scale
● Measured in decibels (dB)
● Ration between two intensity values
● Use an intensity of reference (TOH)

<b>Loudness</b>

● Subjective perception of sound intensity
● Depends on duration / frequency of a sound
● Depends on age
● Measured in phons

<b>Complex sound</b>

● Superposition of sinusoids
● A partial is a sinusoid used to describe a sound
● The lowest partial is called fundamental frequency

<b>Analog to digital conversion</b>
● Sampling
● Quantization

<b>Sampling rate</b>  Sr = 1/T

<b>Nyquist frequency</b> fn = Sr/2

<b>Dynamic range</b>  Difference between largest/smallest signal a system can record

<b>Signal-to-quantization-noise ratio   -  SQNR</b>
● Relationship between max signal strength and
quantization error
● Correlates with dynamic range


<b>Signal domain</b>
 1. Time domain:  Amplitude envelope, Root-mean square energy,
   Zero crossing rate
 2. Frequency domain: Band energy ratio, Spectral centroid, Spectral flux 
 3. Time-frequency representation:  Spectrogram, Mel-spectrogram, Constant-Q transform

<H1>Audio Signal</H1> 

<b>Amplitude envelope  AE</b> 
● Max amplitude value of all samples in a frame
● Gives rough idea of loudness
● Sensitive to outliers
● Onset detection, music genre classification
AEt =   max s(k)  -  Amplitude envelope at frame t
 
<b>Root-mean-square energy</b>   
● RMS of all samples in a frame t

  RMSt =[   (1/K) *  (SIGMA (s(k))^2)   ] ^0.5      SIGMA from k=t*K to  k= (t+1)*K - 1
  
    Amplitute of k th  sample:  s(k)
    Energy of k th  sample:    (s(k))^2
    Sum of energy of all samples in frame t:   SIGMA (s(k))^2)
    
● Indicator of loudness
● Less sensitive to outliers than AE

● Application :Audio segmentation, music genre classification 

<b> Zero crossing rate (ZCRt)</b>   14:00
● Number of times a signal crosses the horizontal axis
        
    ZCRt = ( 0.5* SIGMA  | sgn(s(k))- sgn(s(k+1)) | )   SIGMA from k=t*K to  k=(t+1)*K - 1
<b>Zero crossing rate applications </b>
● Recognition of percussive vs pitched sounds
● Monophonic pitch estimation
● Voice/unvoiced decision for speech signals    

In [None]:
import os
import matplotlib.pyplot as plt
import numpy as np
import librosa
import librosa.display
import IPython.display as ipd

## Loading Audio Files

In [None]:
BASE_FOLDER = "C:/Users/rockman/Music/wav"
 
 
debussy_file = os.path.join(BASE_FOLDER, "debussy.wav")
redhot_file  = os.path.join(BASE_FOLDER,"redhot.wav")
duke_file    = os.path.join(BASE_FOLDER,"duke.wav")

 



In [None]:
ipd.Audio(debussy_file)

In [None]:
ipd.Audio(redhot_file)

In [None]:
ipd.Audio(duke_file)

In [None]:
# load audio files with librosa
debussy, sr = librosa.load(debussy_file)
redhot, _ = librosa.load(redhot_file)
duke, _ = librosa.load(duke_file)

## Root-mean-squared energy with Librosa

In [None]:
FRAME_SIZE = 1024
HOP_LENGTH = 512

In [None]:
rms_debussy = librosa.feature.rms(debussy, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
rms_debussy1 = librosa.feature.rms(debussy, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH) 
rms_redhot = librosa.feature.rms(redhot, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
rms_duke = librosa.feature.rms(duke, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]

In [None]:
print("len(rms_debussy) = ",len(rms_debussy))
print("rms_debussy.shape ",rms_debussy.shape)
print("RMS value for each frame in  debussy wav file \n ",rms_debussy)

print("\n\n-------------------  \n " )
print(len(rms_debussy1))
print(rms_debussy1.shape)
print("RMS value for each frame in  debussy wav file \n ",rms_debussy1)

## Visualise RMSE + waveform

In [None]:
"""
librosa.frames_to_time(frames, hop_length=HOP_LENGTH)
Returns
     timesnp.ndarray [shape=(n,)]
     time (in seconds) of each given frame number:
     times[i] = frames[i] * hop_length / sr
 """


frames = range(len(rms_debussy))
t = librosa.frames_to_time(frames, hop_length=HOP_LENGTH)
t

In [None]:
# rms energy is graphed in red   7:04

plt.figure(figsize=(15, 17))

ax = plt.subplot(3, 1, 1)
librosa.display.waveplot(debussy, alpha=0.5)
plt.plot(t, rms_debussy, color="r")
plt.ylim((-1, 1))
plt.title("Debusy")

plt.subplot(3, 1, 2)
librosa.display.waveplot(redhot, alpha=0.5)
plt.plot(t, rms_redhot, color="r")
plt.ylim((-1, 1))
plt.title("RHCP")

plt.subplot(3, 1, 3)
librosa.display.waveplot(duke, alpha=0.5)
plt.plot(t, rms_duke, color="r")
plt.ylim((-1, 1))
plt.title("Duke Ellington")

plt.show()

## RMSE from scratch

In [None]:
def rmse(signal, frame_size, hop_length):
    rmse = []
    
    # calculate rmse for each frame
    for i in range(0, len(signal), hop_length): 
        rmse_current_frame = np.sqrt(sum(signal[i:i+frame_size]**2) / frame_size)
        rmse.append(rmse_current_frame)
    return np.array(rmse)    

In [None]:
rms_debussy1 = rmse(debussy, FRAME_SIZE, HOP_LENGTH)
rms_redhot1 = rmse(redhot, FRAME_SIZE, HOP_LENGTH)
rms_duke1 = rmse(duke, FRAME_SIZE, HOP_LENGTH)

In [None]:
#Draw  our  homemade rms function and the lebrose function on 
#the same  graph
plt.figure(figsize=(15, 17))

ax = plt.subplot(3, 1, 1)
librosa.display.waveplot(debussy, alpha=0.5)
plt.plot(t, rms_debussy, color="r")
plt.plot(t, rms_debussy1, color="y")
plt.ylim((-1, 1))
plt.title("Debusy")

plt.subplot(3, 1, 2)
librosa.display.waveplot(redhot, alpha=0.5)
plt.plot(t, rms_redhot, color="r")
plt.plot(t, rms_redhot1, color="y")
plt.ylim((-1, 1))
plt.title("RHCP")

plt.subplot(3, 1, 3)
librosa.display.waveplot(duke, alpha=0.5)
plt.plot(t, rms_duke, color="r")
plt.plot(t, rms_duke1, color="y")
plt.ylim((-1, 1))
plt.title("Duke Ellington")

plt.show()

## Zero-crossing rate with Librosa

In [None]:
#https://librosa.org/doc/latest/feature.html#spectral-features
#https://librosa.org/doc/latest/generated/librosa.feature.zero_crossing_rate.html#librosa.feature.zero_crossing_rate
zcr_debussy = librosa.feature.zero_crossing_rate(debussy, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_redhot = librosa.feature.zero_crossing_rate(redhot, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_duke = librosa.feature.zero_crossing_rate(duke, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]

 
#librosa.feature.zero_crossing_rate -  Returns  zcrnp.ndarray [shape=(1, t)]
 # zcr[0, i] is the fraction of zero crossings in the ith frame
zcr_debussy

In [None]:
zcr_debussy.size

## Visualise zero-crossing rate with Librosa

In [None]:
plt.figure(figsize=(15, 10))

plt.plot(t, zcr_debussy, color="y")
plt.plot(t, zcr_redhot, color="r")
plt.plot(t, zcr_duke, color="b")
plt.ylim(0, 1)# zero_crossing_rate is normalized to the range [0,1]
plt.show()

In [None]:
 #non normalize zcr.  The y aix numbers are the  real zcr
plt.figure(figsize=(15, 10))

plt.plot(t, zcr_debussy * FRAME_SIZE , color="y")
plt.plot(t, zcr_redhot * FRAME_SIZE, color="r")
plt.plot(t, zcr_duke * FRAME_SIZE, color="b")
plt.ylim(0, 500)
plt.show()

## ZCR: Voice vs Noise
24

In [None]:

voice_file   = os.path.join(BASE_FOLDER,"voice.wav")
noise_file    =os.path.join(BASE_FOLDER,"noise.wav")
voice_file

In [None]:
ipd.Audio(voice_file)

In [None]:
ipd.Audio(noise_file)

In [None]:
# load audio files
DURATION = 15
voice, _ = librosa.load(voice_file, duration=DURATION)
noise, _ = librosa.load(noise_file, duration=DURATION)

In [None]:
# get ZCR
zcr_voice = librosa.feature.zero_crossing_rate(voice, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_noise = librosa.feature.zero_crossing_rate(noise, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]

In [None]:
frames = range(len(zcr_voice))
t = librosa.frames_to_time(frames, hop_length=HOP_LENGTH)

In [None]:
plt.figure(figsize=(15, 10))

plt.plot(t, zcr_voice, color="y")
plt.plot(t, zcr_noise, color="r")
plt.ylim(0, 1)
plt.show()