# Chapter 8: Room Impulse Response (RIR)
##

### 1. Time-domain

Time domain refers to the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various separate instants in the case of discrete time. An oscilloscope is a tool commonly used to visualize real-world signals in the time domain. A time-domain graph shows how a signal changes with time, whereas a frequency-domain graph shows how much of the signal lies within each given frequency band over a range of frequencies.

Though most precisely referring to time in physics, the term time domain may occasionally informally refer to position in space when dealing with spatial frequencies, as a substitute for the more precise term spatial domain.

![alternatvie text](..\images\Fig-8-0.png)

### 2. IR Measurement

External Impulse

## How to measure an impulse response

Interpreting impulse responses is an important part of acoustic analysis. An impulse response measurement can tell us a great deal about a room and the way sound will be reproduced within it. It can show us what kinds of treatment will be helpful and whether treatments have been correctly applied to achieve the best results. This page explains impulse responses, the information that can be extracted from them and how REW can measure and analyse such responses.

Before we can get very far in interpreting an impulse response we need to understand what an impulse response is. The impulse response is in essence a recording of what it would sound like in the room if you played an extremely loud, extremely short click - something like the crack of a pistol shot. The reason for measuring the impulse response (by more subtle means than firing a gun in the room) is that it completely characterises the behaviour of the system consisting of the speaker(s) that were measured and the room they are in, at the point where the measurement microphone is placed. An important property of an impulse, not intuitively obvious, is that it if you break it up into individual sine waves you find that it contains all frequencies at the same amplitude. Strange but true. This means that you can work out a system's frequency response by determining the frequency components that make up its impulse response. REW does this by Fourier Transforming the impulse response, which in essence breaks it up into its individual frequency components. The plot of the magnitude of each of those frequency components is the system's frequency response.

When an impulse response is measured by means of a logarithmically swept sine wave, the room's linear response is conveniently separated from its non-linear response. The portion of the response before the initial peak at time=0 is actually due to the system's distortion - looking closely, there are scaled down, horizontally compressed copies of the main impulse response there - each of those copies is due to a distortion harmonic, first the 2nd harmonic, then the third, then the fourth etc. as time gets more negative. The initial peak and its subsequent decay after time=0 is the system's response without the distortion.

![alternatvie text](..\images\Fig-8-0-1.png)

In a perfect system of infinite bandwidth with totally absorbent boundaries, the impulse response would look like a single spike at time 0 and nothing anywhere else - the closest you get to that is measuring the soundcard's loopback response. In a real system, finite bandwidth spreads out the response (dramatically so when measuring a subwoofer as its bandwidth is very limited). Reflections from the room's boundaries add to the initial response at times that correspond to how much further they had to travel to reach the microphone - for example, if the microphone were 10 feet from the speaker and a sound reflection from a wall had to travel 15 feet to reach the microphone, that reflection would contribute a spike (smeared out depending on the nature of the reflection) about 5 ms after the initial peak, because sound takes about 5 ms to travel that extra 5 feet.

When measuring full range responses from loudspeakers (rather than subwoofer responses) the reflections are easier to spot as the higher bandwidth of the full range system keeps the spike of the impulse (and the reflections) quite narrow, but you need to zoom in on the time axis to see them. They are easier to spot with a linear Y axis (set to %FS instead of dBFS) and also show up more readily with the ETC smoothing set to 0.


##### Internal
#### - MLS

A maximum length sequence (MLS) is a type of pseudorandom binary sequence.

They are bit sequences generated using maximal linear-feedback shift registers and are so called because they are periodic and reproduce every binary sequence (except the zero vector) that can be represented by the shift registers (i.e., for length-m registers they produce a sequence of length 2m − 1). An MLS is also sometimes called an n-sequence or an m-sequence. MLSs are spectrally flat, with the exception of a near-zero DC term.

These sequences may be represented as coefficients of irreducible polynomials in a polynomial ring over Z/2Z.

Practical applications for MLS include measuring impulse responses (e.g., of room reverberation or arrival times from towed sources in the ocean[1]). They are also used as a basis for deriving pseudo-random sequences in digital communication systems that employ direct-sequence spread spectrum and frequency-hopping spread spectrum transmission systems, optical dielectric multilayer reflector design,[2] and in the efficient design of some fMRI experiments.

#### - lin Sweep

In computational geometry, a sweep line algorithm or plane sweep algorithm is an algorithmic paradigm that uses a conceptual sweep line or sweep surface to solve various problems in Euclidean space. It is one of the critical techniques in computational geometry.

The idea behind algorithms of this type is to imagine that a line (often a vertical line) is swept or moved across the plane, stopping at some points. Geometric operations are restricted to geometric objects that either intersect or are in the immediate vicinity of the sweep line whenever it stops, and the complete solution is available once the line has passed over all objects.

#### - e-Sweep

#### MLS versus Sweep

![alternatvie text](..\images\Fig-8-1.png)

#### e-Sweep versus lin-Sweep

![alternatvie text](..\images\Fig-8-2.png)

#### e-Sweep 

![alternatvie text](..\images\Fig-8-3.png)

#### Lin-Sweep

![alternatvie text](..\images\Fig-8-4.png)

### 3. Related concepts

#### Flutter

In electronics and communication, flutter is the rapid variation of signal parameters, such as amplitude, phase, and frequency. Examples of electronic flutter are:

Rapid variations in received signal levels, such as variations that may be caused by atmospheric disturbances, antenna movements in a high wind, or interaction with other signals.
In radio propagation, a phenomenon in which nearly all radio signals that are usually reflected by ionospheric layers in or above the E-region experience partial or complete absorption.
In radio transmission, rapidly changing signal levels, together with variable multipath time delays, caused by reflection and possible partial absorption of the signal by aircraft flying through the radio beam or common scatter volume.
The variation in the transmission characteristics of a loaded telephone line caused by the action of telegraph direct currents on the loading coils.
In recording and reproducing equipment, the deviation of frequency caused by irregular mechanical motion, e.g., that of capstan angular velocity in a tape transport mechanism, during operation.


![alternatvie text](..\images\Fig-8-5.png)


#### Waterfall plot

The Waterfall plot in Liberty Audiosuite or IMP is more properly known as the "Cumulative Spectral Decay" (CSD) plot. This plot technique is generally credited to Fincham and Bernam (of KEF) who used it to detect resonances and internal box reflections in loudspeakers.

![alternatvie text](..\images\Fig-8-6.png)

A waterfall is a presentation of both frequency domain and time domain data on a single graph. Time domain data is voltage or pressure as a function of time, usually in the form of a measured impulse response (origination from a pulse or MLS measurement), which covers all time (but can be assumed to decay to insignificant levels within a finite time). The frequency domain version is the decomposition of the time domain impulse response into periodic cosine waveforms via Fourier analysis (the impulse response can be represented as a summation of an infinite number of cosine waves of different frequencies). In any combined time-frequency analysis, there are inherent resolution limitations due to the related (reciprocal) relationship of time (seconds) and frequency (per second). One cannot, for instance, talk about a frequency at a point in time -- it is rather meaningless to discuss a periodic wave unless (at the very least) the time length of that period is considered. Hence, a frequency component cannot be said to start or stop at a specific time. But a band of frequencies can be analyzed in terms of its energy within a said time segment.

One relatively obvious way to do this is to select (or "window") only a portion of a time signal and perform a Fourier analysis over that section only, as if the time signal were zero elsewhere. This does generate some problems in that extraneous frequency components can be erroneously created by suddenly chopping a non-zero section of a signal to zero at the edges of the window. Use of windows which are tapered, on both edges of the time segment or on only one, can help to reduce (but not eliminate) this effect.

If the time length of the segment is kept constant, but its position in the time continuum is varied as one axis of the plot and the resulting spectrum of the Fourier analysis is shown versus the remaining axes, a plot known as the Short-Term Fourier Spectrogram results. This plot is an attempt to show "frequency response versus time". It has the disadvantage that it can contain no valid information for frequencies below 1/(time segment length); therefore, to get data down to 500Hz, the segment length must be at least 2msec long. If, however, an echo occurs from a realworld measurement at 3msec after the beginning of the impulse response, the time segment can be swept over only 1msec if the echo characteristic is not to be included; this wouldn’t give much of a span for the time axis. If shorter time spans are analyzed, a frequency-vs-time plot can be made over a longer time length, but can only show data for very high frequencies and at poor resolution.

If, on the other hand, the entire active time trace is included for the initial transformation and then the position of the later edge of the window is held fixed relative to the beginning point of the impulse response, and only the earlier edge is varied, a CSD or LAUD waterfall plot results. Because the length of each time segment is being shortened with each successive step in the "time sweep", the lowest resolvable frequency increases (loses resolution) at later points in the plot. At the first traces of the plot, frequencies down to the anechoic limit can be displayed; successive curves will be valid to approximately 1/( windowed time span). In IMP and Audiosuite (and PRAXIS), data below the LF resolution cutoff are not plotted, resulting in an easily identified drop edge on the later traces of the plot; some other packages plot such below-resolution LF data anyway, although it is not meaningful and can lead to incorrect conclusions based on information which simply isn’t there.

The CSD waterfall does NOT show frequency response versus time! It shows that (approximate) frequency content contribution to a total response which occurs after the (relative) time shown in the time axis. At t=0 on the CSD plot, the entire frequency response is drawn, as the total response occurs after this time. At t=1msec the CSD plot shows the contribution to the frequency response which occurs after 1 msec but not before 1msec, and so on. But note the caution previously stated above: frequencies cannot start at a precise time! -- CSDs often show the user’s technique more than the speaker’s quality! Also note that if an echo pulse is included in the time segment selected for the waterfall plot, the frequency contribution of the echo will be present in the plot for all times before the echo occurs.

CSD waterfalls are most often used to detect and display resonant behavior in speaker cones, boxes or horns. A resonance will show up as a long decaying ridge along the time axis, due to the "ringing" of the resonance over time. The data shown is VERY SUSCEPTIBLE to measurement and display conditions. The inherent windowing operation is hacking into the most active part of the impulse response, the result of which will change depending on the window shape used, on the step size being used for the waterfall graphing routine, on display scales, on the low-frequency (even below resolution) content and phase, etc. Each curve trace on its own cannot be said to carry much useful information -- it is the overall plot, the ridges, shelves and valleys of the overall surface which are revealing definitely in a qualitative, but in only a slightly quantitative, way.

If one, for whatever reason, wanted to find the waterfall curve value at a point in time and at a specified frequency and read off a dB value for same, he need only recreate the window edge condition for the waterfall plot and perform the corresponding FFT. For example, assume a waterfall plot has been made normally by setting the time markers so that marker number 1 is just before the activity in the time impulse response and marker number two is just before the first significant reflection. You want to read off a value at the waterfall plot’s "1.4 msec" trace, and at 3.3kHz. Merely go back to the time domain plot (by pressing F1). Move marker number 1 to a position 1.4msec to the right of its current position, go to the transform menu and select "FFT". The curve corresponding to the 1.4msec trace of the waterfall plot will result, and you can use the frequency domain markers to read off any values of interest.

#### Spectrogram

A spectrogram is a visual way of representing the signal strength, or “loudness”, of a signal over time at various frequencies present in a particular waveform.  Not only can one see whether there is more or less energy at, for example, 2 Hz vs 10 Hz, but one can also see how energy levels vary over time.  In other sciences spectrograms are commonly used to display frequencies of sound waves produced by humans, machinery, animals, whales, jets, etc., as recorded by microphones.  In the seismic world, spectrograms are increasingly being used to look at frequency content of continuous signals recorded by individual or groups of seismometers to help distinguish and characterize different types of earthquakes or other vibrations in the earth. 

![alternatvie text](..\images\Fig-8-7.png)

##### How do you read a spectrogram?

Spectrograms are basically two-dimensional graphs, with a third dimension represented by colors. Time runs from left (oldest) to right (youngest) along the horizontal axis. Each of our volcano and earthquake sub-groups of spectrograms shows 10 minutes of data with the tic marks along the horizontal axis corresponding to 1-minute intervals.  The vertical axis represents frequency, which can also be thought of as pitch or tone, with the lowest frequencies at the bottom and the highest frequencies at the top.  The amplitude (or energy or “loudness”) of a particular frequency at a particular time is represented by the third dimension, color, with dark blues corresponding to low amplitudes and brighter colors up through red corresponding to progressively stronger (or louder) amplitudes.

### 4. Microphones

##### Onmidirectional

The prefix “omni-” comes from Latin and means “all”. In this way, an omni directional microphone captures sound equally from all directions. This means that whether you’re in front, behind, or on one side of the mic, it records the signal with equal strength.

The omni microphone is always great in situations where sound needs to be detected and recorded from different directions or locations. Those used very close to the source, such as lavaliers, headsets, and earsets, are usually omnidirectional microphones.

##### Hypercardioid

The hypercardioid is an often misunderstood relative of the well-known cardioid microphone polar pattern and is often confused with the supercardioid pattern. Understanding hypercardioid polar patterns and their ideal applications will improve your efficiency both on stage and in the studio.

What is a hypercardioid microphone? A hypercardioid microphone has a very directional hypercardioid polar/pickup pattern. It is most sensitive to on-axis sounds (where the mic “points”) with null points at 110° and 250° and a rear lobe of sensitivity. Hypercardioid mics are popular in film due to their high directionality.

In this in-depth article, we'll discuss the hypercardioid microphone polar pattern in great detail to answer any questions you may have about hypercardioid microphones!

##### Cardioid

A cardioid microphone is characterized as a unidirectional microphone. The polar pattern of the cardioid exhibits full sensitivity on-axis, whereas the sensitivity at 180° in principle is $-∞$; in practice $>20 dB$.

The $-3 dB$ angle is at $±65.5°$. The $-6 dB$ angle is at $±90°$.

In principle, this characteristic can be obtained by combining equal parts of the output of an omnidirectional microphone and a bidirectional microphone.
The polar equation is: $0.5 (1+ cosϕ)$

##### Supercardioid



##### Bidirectional


### 5. Parameters

A closer inspection of $Eq: 5$ shows that the natural frequencies of a rectangular room may be interpreted in a geometrical way. The room acoustics in small rooms consider places such as listening rooms, sound studios, recording studios, audiometry rooms is possible to transfer music and speech, to set sound assssment and perform measurements. In large rooms or halls there can be considered places such as concert halls, theaters, auditoriums, lecture halls.

For large room condition the $Eq: 7$ is used to scimate the Schoder frequencz.

For small rooms some expected values include:

$V = 13000  m^2$

$T = 2  s$

$f_s = 25  Hz$

For large rooms/halls:

$V = 25  m^2$

$T = 1  s$

$f_s = 400  Hz$

### 6. Roomacoustics parametes (ISO 3382)

#### Strenght G


#### Reververation Time T_60


#### Early Decay Time EDT


#### Reverberation time T_20



#### Reverberation time T_20



#### Clarity C_80



#### Definition D_50


#### Centre Time


#### Early Lateral Energy LF


#### Early Lateral Energy LFC


#### Interaural Crosscorrelation IACC


#### Speech Intelligibility


| | |
|---|---|
| $$D = \frac{\int_0^{50ms} [g(t)]^2 dt}{\int_0^\infty [g(t)]^2 dt} * 100 $$ | **[8]** |



### 7. Speech transmission Index

#### Signal modulation reduction


#### Modulation reduction

### 8. Speech Intelligibility

#### Speech Transmission Index: STI

| | |
|---|---|
| $$ Q = Q_s Q_r $$ | 
| $$L_p = L_W + 10lg \frac{Q}{4\pi r^2}  $$ | **[21]** |

### References

1. Kuttruff, Acoustics: An Introduction; Taylor and Francis , 2007
2. "What is Directivity?" McSquared System Design Group, Inc. - Consultants in Sound/Video/Acoustics. http://www.mcsquared.com/directvt.htm (accedido el 12 de febrero de 2023).
