# Analysis of Audio Signals

## Signal analysis tasks & applications

- Tasks:
    - Feature analysis of audio signals
    - Model parameter estimation for sound synthesis/coding
    - Signal/noise detection
    - Source separation
    - Automatic transcription
- Many current and future applications:
    - Pitch correction
    - Audio restoration
    - Noise reduction
    - Automatic classification of audio(e.g.music/speech/commercial/silence) 
    - Music understanding systems
        - Recognition of musical piece, style, composer, performer etc.

## Different forms for analysis

### Waveform
It is the lowest level signal representation. You can see:
- Attack time and other temporal features of simple signals
- Temporal envelope, decay rate, periodicity, smoothness?

### Spectrum
- Human hearing works as a spectral analyzer
- For single tones, a one-shot spectrum is useful. You can compute FFT of the whole signal.
- With spectrum, you can see:
    - Partials as peaks
    - Harmonicity / inharmonicity
    - Fundamental frequency?
    - Noise content

## DFT and FFT
With FFT, number of multiplications is $O(N\log N$) instead of $O(N^2)$; e.g. For a 1024-point DFT, the speedup factor is about 100

Both DFT and FFT has common issues: 
- No temporal information
    - Signal onsets/offsets cause smearing
- Shape depends on frequency 
    - Wide main lobe
    - Confusing side lobes
    - Spectral leakage
- Rounding errors look like additional noise

### Spectral Leakage
This phenomenon is caused when applying Fourier Transform on a discontinuous segment which we treat as periodic. One can apply different windows, such as hamming, hanning, black man or kaiser, to "fix" leakage. As a side-effect, windowing generally reduces the energy.

## STFT
A sequence of FFTs for both time- and frequency-domain.

## Features of Audio Signals
- Duration
- Loudness
- Pitch
- Timbre

## Loudness Estimation
- Running RMS value
    - Time-varying estimate proportional to instantaneous signal power
- Convert to decibels
    - 20log[y(n)]
    - Human sensitivity to loudness follows approximately logarithmic relation
- An auditory model of loudness perception is needed in principle
    - Should account for frequency-dependent sensitivity of human hearing
    - For example brightness affects loudness perception
- Recent recommendation is RLB(Revised Low-Frequency B Weighting)

![rlb](images/rlb.png)
![rlb_2](images/rlb_2.png)

## Pitch
- Pitch is the perceived fundamental frequency
    - F0 is a physical quantity - pitch is a subjective attribute
    - Pitch is thefrequency that humans would sing, hum, whistle when asked about the height of a musical tone
    - Alternatively, test subject can adjust the frequency of a sine wave to match a test tone
- For sine waves: pitch = F0
- Humans perceive pitch clearly for very complex tones
    - Pitch of complex harmonic and even in harmonic tones(e.g.,bells)
    - Also ”missing fundamental” is strongly perceived(e.g.,on the phone) 
    - The auditory system tries to assign a pitch to all sounds
    
### Pitch Extraction
- Pitch estimation methods were first developed for speech 
    - Today hundreds of estimation methods available
- Methods can be classified into two classes
    - 1)Time-domain methods: periodicity,T 
    - 2)Frequency-domain methods: fundamental frequency, F0
- Problematic algorithms
    - Large errors are usually octave errors(one octave up or down) 
    - Pre- or post-processing may reduce errors
        - For example, compression or spectral whitening of input signal, median filtering of a sequence of F0 estimates
- The newest ’good’ method is YIN