# Chapter 8: Room Impulse Response (RIR)
##

### 1. Time-domain

Time domain refers to the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various separate instants in the case of discrete time. An oscilloscope is a tool commonly used to visualize real-world signals in the time domain. A time-domain graph shows how a signal changes with time, whereas a frequency-domain graph shows how much of the signal lies within each given frequency band over a range of frequencies.

Though most precisely referring to time in physics, the term time domain may occasionally informally refer to position in space when dealing with spatial frequencies, as a substitute for the more precise term spatial domain.

![alternatvie text](..\images\Fig-8-0.png)

### 2. IR Measurement

External Impulse

## How to measure an impulse response

Interpreting impulse responses is an important part of acoustic analysis. An impulse response measurement can tell us a great deal about a room and the way sound will be reproduced within it. It can show us what kinds of treatment will be helpful and whether treatments have been correctly applied to achieve the best results. This page explains impulse responses, the information that can be extracted from them and how REW can measure and analyse such responses.

Before we can get very far in interpreting an impulse response we need to understand what an impulse response is. The impulse response is in essence a recording of what it would sound like in the room if you played an extremely loud, extremely short click - something like the crack of a pistol shot. The reason for measuring the impulse response (by more subtle means than firing a gun in the room) is that it completely characterises the behaviour of the system consisting of the speaker(s) that were measured and the room they are in, at the point where the measurement microphone is placed. An important property of an impulse, not intuitively obvious, is that it if you break it up into individual sine waves you find that it contains all frequencies at the same amplitude. Strange but true. This means that you can work out a system's frequency response by determining the frequency components that make up its impulse response. REW does this by Fourier Transforming the impulse response, which in essence breaks it up into its individual frequency components. The plot of the magnitude of each of those frequency components is the system's frequency response.

When an impulse response is measured by means of a logarithmically swept sine wave, the room's linear response is conveniently separated from its non-linear response. The portion of the response before the initial peak at time=0 is actually due to the system's distortion - looking closely, there are scaled down, horizontally compressed copies of the main impulse response there - each of those copies is due to a distortion harmonic, first the 2nd harmonic, then the third, then the fourth etc. as time gets more negative. The initial peak and its subsequent decay after time=0 is the system's response without the distortion.

![alternatvie text](..\images\Fig-8-0-1.png)

In a perfect system of infinite bandwidth with totally absorbent boundaries, the impulse response would look like a single spike at time 0 and nothing anywhere else - the closest you get to that is measuring the soundcard's loopback response. In a real system, finite bandwidth spreads out the response (dramatically so when measuring a subwoofer as its bandwidth is very limited). Reflections from the room's boundaries add to the initial response at times that correspond to how much further they had to travel to reach the microphone - for example, if the microphone were 10 feet from the speaker and a sound reflection from a wall had to travel 15 feet to reach the microphone, that reflection would contribute a spike (smeared out depending on the nature of the reflection) about 5 ms after the initial peak, because sound takes about 5 ms to travel that extra 5 feet.

When measuring full range responses from loudspeakers (rather than subwoofer responses) the reflections are easier to spot as the higher bandwidth of the full range system keeps the spike of the impulse (and the reflections) quite narrow, but you need to zoom in on the time axis to see them. They are easier to spot with a linear Y axis (set to %FS instead of dBFS) and also show up more readily with the ETC smoothing set to 0.


##### Internal
#### - MLS

A maximum length sequence (MLS) is a type of pseudorandom binary sequence.

They are bit sequences generated using maximal linear-feedback shift registers and are so called because they are periodic and reproduce every binary sequence (except the zero vector) that can be represented by the shift registers (i.e., for length-m registers they produce a sequence of length 2m − 1). An MLS is also sometimes called an n-sequence or an m-sequence. MLSs are spectrally flat, with the exception of a near-zero DC term.

These sequences may be represented as coefficients of irreducible polynomials in a polynomial ring over Z/2Z.

Practical applications for MLS include measuring impulse responses (e.g., of room reverberation or arrival times from towed sources in the ocean[1]). They are also used as a basis for deriving pseudo-random sequences in digital communication systems that employ direct-sequence spread spectrum and frequency-hopping spread spectrum transmission systems, optical dielectric multilayer reflector design,[2] and in the efficient design of some fMRI experiments.

#### - lin Sweep

In computational geometry, a sweep line algorithm or plane sweep algorithm is an algorithmic paradigm that uses a conceptual sweep line or sweep surface to solve various problems in Euclidean space. It is one of the critical techniques in computational geometry.

The idea behind algorithms of this type is to imagine that a line (often a vertical line) is swept or moved across the plane, stopping at some points. Geometric operations are restricted to geometric objects that either intersect or are in the immediate vicinity of the sweep line whenever it stops, and the complete solution is available once the line has passed over all objects.

#### - e-Sweep

#### MLS versus Sweep

![alternatvie text](..\images\Fig-8-1.png)

#### e-Sweep versus lin-Sweep

![alternatvie text](..\images\Fig-8-2.png)

#### e-Sweep 

![alternatvie text](..\images\Fig-8-3.png)

#### Lin-Sweep

![alternatvie text](..\images\Fig-8-4.png)

### 3. Related concepts

#### Flutter

In electronics and communication, flutter is the rapid variation of signal parameters, such as amplitude, phase, and frequency. Examples of electronic flutter are:

Rapid variations in received signal levels, such as variations that may be caused by atmospheric disturbances, antenna movements in a high wind, or interaction with other signals.
In radio propagation, a phenomenon in which nearly all radio signals that are usually reflected by ionospheric layers in or above the E-region experience partial or complete absorption.
In radio transmission, rapidly changing signal levels, together with variable multipath time delays, caused by reflection and possible partial absorption of the signal by aircraft flying through the radio beam or common scatter volume.
The variation in the transmission characteristics of a loaded telephone line caused by the action of telegraph direct currents on the loading coils.
In recording and reproducing equipment, the deviation of frequency caused by irregular mechanical motion, e.g., that of capstan angular velocity in a tape transport mechanism, during operation.


![alternatvie text](..\images\Fig-8-5.png)


#### Waterfall plot

The Waterfall plot in Liberty Audiosuite or IMP is more properly known as the "Cumulative Spectral Decay" (CSD) plot. This plot technique is generally credited to Fincham and Bernam (of KEF) who used it to detect resonances and internal box reflections in loudspeakers.

![alternatvie text](..\images\Fig-8-6.png)

A waterfall is a presentation of both frequency domain and time domain data on a single graph. Time domain data is voltage or pressure as a function of time, usually in the form of a measured impulse response (origination from a pulse or MLS measurement), which covers all time (but can be assumed to decay to insignificant levels within a finite time). The frequency domain version is the decomposition of the time domain impulse response into periodic cosine waveforms via Fourier analysis (the impulse response can be represented as a summation of an infinite number of cosine waves of different frequencies). In any combined time-frequency analysis, there are inherent resolution limitations due to the related (reciprocal) relationship of time (seconds) and frequency (per second). One cannot, for instance, talk about a frequency at a point in time -- it is rather meaningless to discuss a periodic wave unless (at the very least) the time length of that period is considered. Hence, a frequency component cannot be said to start or stop at a specific time. But a band of frequencies can be analyzed in terms of its energy within a said time segment.

One relatively obvious way to do this is to select (or "window") only a portion of a time signal and perform a Fourier analysis over that section only, as if the time signal were zero elsewhere. This does generate some problems in that extraneous frequency components can be erroneously created by suddenly chopping a non-zero section of a signal to zero at the edges of the window. Use of windows which are tapered, on both edges of the time segment or on only one, can help to reduce (but not eliminate) this effect.

If the time length of the segment is kept constant, but its position in the time continuum is varied as one axis of the plot and the resulting spectrum of the Fourier analysis is shown versus the remaining axes, a plot known as the Short-Term Fourier Spectrogram results. This plot is an attempt to show "frequency response versus time". It has the disadvantage that it can contain no valid information for frequencies below 1/(time segment length); therefore, to get data down to 500Hz, the segment length must be at least 2msec long. If, however, an echo occurs from a realworld measurement at 3msec after the beginning of the impulse response, the time segment can be swept over only 1msec if the echo characteristic is not to be included; this wouldn’t give much of a span for the time axis. If shorter time spans are analyzed, a frequency-vs-time plot can be made over a longer time length, but can only show data for very high frequencies and at poor resolution.

If, on the other hand, the entire active time trace is included for the initial transformation and then the position of the later edge of the window is held fixed relative to the beginning point of the impulse response, and only the earlier edge is varied, a CSD or LAUD waterfall plot results. Because the length of each time segment is being shortened with each successive step in the "time sweep", the lowest resolvable frequency increases (loses resolution) at later points in the plot. At the first traces of the plot, frequencies down to the anechoic limit can be displayed; successive curves will be valid to approximately 1/( windowed time span). In IMP and Audiosuite (and PRAXIS), data below the LF resolution cutoff are not plotted, resulting in an easily identified drop edge on the later traces of the plot; some other packages plot such below-resolution LF data anyway, although it is not meaningful and can lead to incorrect conclusions based on information which simply isn’t there.

The CSD waterfall does NOT show frequency response versus time! It shows that (approximate) frequency content contribution to a total response which occurs after the (relative) time shown in the time axis. At t=0 on the CSD plot, the entire frequency response is drawn, as the total response occurs after this time. At t=1msec the CSD plot shows the contribution to the frequency response which occurs after 1 msec but not before 1msec, and so on. But note the caution previously stated above: frequencies cannot start at a precise time! -- CSDs often show the user’s technique more than the speaker’s quality! Also note that if an echo pulse is included in the time segment selected for the waterfall plot, the frequency contribution of the echo will be present in the plot for all times before the echo occurs.

CSD waterfalls are most often used to detect and display resonant behavior in speaker cones, boxes or horns. A resonance will show up as a long decaying ridge along the time axis, due to the "ringing" of the resonance over time. The data shown is VERY SUSCEPTIBLE to measurement and display conditions. The inherent windowing operation is hacking into the most active part of the impulse response, the result of which will change depending on the window shape used, on the step size being used for the waterfall graphing routine, on display scales, on the low-frequency (even below resolution) content and phase, etc. Each curve trace on its own cannot be said to carry much useful information -- it is the overall plot, the ridges, shelves and valleys of the overall surface which are revealing definitely in a qualitative, but in only a slightly quantitative, way.

If one, for whatever reason, wanted to find the waterfall curve value at a point in time and at a specified frequency and read off a dB value for same, he need only recreate the window edge condition for the waterfall plot and perform the corresponding FFT. For example, assume a waterfall plot has been made normally by setting the time markers so that marker number 1 is just before the activity in the time impulse response and marker number two is just before the first significant reflection. You want to read off a value at the waterfall plot’s "1.4 msec" trace, and at 3.3kHz. Merely go back to the time domain plot (by pressing F1). Move marker number 1 to a position 1.4msec to the right of its current position, go to the transform menu and select "FFT". The curve corresponding to the 1.4msec trace of the waterfall plot will result, and you can use the frequency domain markers to read off any values of interest.

#### Spectrogram

A spectrogram is a visual way of representing the signal strength, or “loudness”, of a signal over time at various frequencies present in a particular waveform.  Not only can one see whether there is more or less energy at, for example, 2 Hz vs 10 Hz, but one can also see how energy levels vary over time.  In other sciences spectrograms are commonly used to display frequencies of sound waves produced by humans, machinery, animals, whales, jets, etc., as recorded by microphones.  In the seismic world, spectrograms are increasingly being used to look at frequency content of continuous signals recorded by individual or groups of seismometers to help distinguish and characterize different types of earthquakes or other vibrations in the earth. 

![alternatvie text](..\images\Fig-8-7.png)

##### How do you read a spectrogram?

Spectrograms are basically two-dimensional graphs, with a third dimension represented by colors. Time runs from left (oldest) to right (youngest) along the horizontal axis. Each of our volcano and earthquake sub-groups of spectrograms shows 10 minutes of data with the tic marks along the horizontal axis corresponding to 1-minute intervals.  The vertical axis represents frequency, which can also be thought of as pitch or tone, with the lowest frequencies at the bottom and the highest frequencies at the top.  The amplitude (or energy or “loudness”) of a particular frequency at a particular time is represented by the third dimension, color, with dark blues corresponding to low amplitudes and brighter colors up through red corresponding to progressively stronger (or louder) amplitudes.

### 4. Microphones

##### Omnidirectional


The prefix “omni-” comes from Latin and means “all”. In this way, an omni directional microphone captures sound equally from all directions. This means that whether you’re in front, behind, or on one side of the mic, it records the signal with equal strength.

The omni microphone is always great in situations where sound needs to be detected and recorded from different directions or locations. Those used very close to the source, such as lavaliers, headsets, and earsets, are usually omnidirectional microphones.


##### Hypercardioid


The hypercardioid is an often misunderstood relative of the well-known cardioid microphone polar pattern and is often confused with the supercardioid pattern. Understanding hypercardioid polar patterns and their ideal applications will improve your efficiency both on stage and in the studio.

What is a hypercardioid microphone? A hypercardioid microphone has a very directional hypercardioid polar/pickup pattern. It is most sensitive to on-axis sounds (where the mic “points”) with null points at 110° and 250° and a rear lobe of sensitivity. Hypercardioid mics are popular in film due to their high directionality.

In this in-depth article, we'll discuss the hypercardioid microphone polar pattern in great detail to answer any questions you may have about hypercardioid microphones!


##### Cardioid


A cardioid microphone is characterized as a unidirectional microphone. The polar pattern of the cardioid exhibits full sensitivity on-axis, whereas the sensitivity at 180° in principle is $-∞$; in practice $>20 dB$.

The $-3 dB$ angle is at $±65.5°$. The $-6 dB$ angle is at $±90°$.

In principle, this characteristic can be obtained by combining equal parts of the output of an omnidirectional microphone and a bidirectional microphone.
The polar equation is: $0.5 (1+ cosϕ)$


##### Supercardioid


The supercardioid is an often misunderstood relative of the well-known cardioid microphone polar pattern. Understanding the supercardioid pattern and when to use it will benefit you greatly in both the studio and the stage.

What is a supercardioid microphone? A supercardioid microphone has a very directional supercardioid polar/pickup pattern. It is most sensitive to on-axis sounds (where the mic “points”) with null points at 127° and 233° and a rear lobe of sensitivity. Supercardioid mics are popular in film due to their high directionality.

In this in-depth article, we'll discuss the supercardioid microphone polar pattern in great detail to answer any questions you may have about supercardioid microphones!


##### Bidirectional


Unidirectional Microphones are microphones that only pick up sound with high gain from a specific side or direction of the microphone. Thus, if a user is speaking into a unidirectional microphone, he must speak into correct side, normally called the voice side, of the microphone in order to get good gain on the recording. This is in contrast to omnidirectional microphones, which pick up sound equally from all directions of the microphone.

The polar plot image shows a typical polar plot response for a unidirectional microphone. The polar plot shows that the microphone has the highest gain when the sound source is directly in front of it, which is shown at the 0 degree reference point. At this 0 degree point, we can see the gain is 11dB. Now as the sound source is rotated so that it is at the sides, the gain of the microphone decreases because it is not as effective at picking up sounds from the sides. As the sound source gets rotated more to the sides toward the rear, the gain drops more and more. And finally from the rear, the microphone picks up the lowest gain, which in this case is about -28dB. This shows again that unidirectional microphones are most effective at picking up sounds from the front while it is much less effective at picking up sounds from the sides and rear.

Unidirectional microphones are used in applications where the target sound source to be recorded is directly in front of the microphone, and all other sounds in the room that may be on the sides and rear do not want to be recorded. An example of this is recording a professor's lecture in a classroom. In a scenario where only the lecture of the professor needs to be recorded without any noise that may be coming from the students behind, a unidirectional microphone has perfect application. Since unidirectional microphones pick up sound well from the front while attenuating all noises in the sides and rear, the high concentration of recording only from the professor talking allows more a cleaner recording than if omnidirectional microphones were used, which would pick up much greater noise, since it records from all sides of the microphone. Thus, unidirectional microphones have great application when a sound source can stay stationary in front of the microphone to record only that sound source and no other.

### 5. Parameters

Some parameters related to the microphones described in the last section are enlisted in the following image.

![alternatvie text](..\images\Fig-8-8.png)

### 6. Room Acoustics parametes (ISO 3382)

#### Strenght G

The strength or sound strength, also known as sound intensity, is the subjective perception of sound pressure. Every person has a different perception of loudness, which means that loudness cannot be measured objectively. The acoustic sound intensity / audio intensity or acoustic signal is therefore relative. The increasing amplitude of the source and that of the vibrating surface causes the kinetic energy of the mass of air.

The sound intensity formula is defined mathematically as: sound intensity=acousticpower/normal area to the direction of propagation

Physically measurable, on the other hand, is the sound pressure, which is converted into sound level and further expressed in decibels (dB). The second measurable quantity is hertz, which measures the number of air pressure fluctuations per second.

However, since sounds with the same sound level but different frequencies are not perceived as equally loud, there are also the subjective measures of sound intensity: phon and sone.

The following paragraphs explain the differences and relationships of the most important terms about sound intensity and how sound intensity is measured.


| | |
|---|---|
| $$G = L_p - L_W + 31[dB] $$ |  |
| $$G = L_p - L_{P(10mdir)} [dB] $$ | **[22]** |
| $$G = 10lg\frac{\int_0^{\infty} {p^2 (t) dt}}{\int_0^{\infty}p_{10,dir}^2 (t)dy} [dB] $$ | **[23]** |

#### Reververation Time $T_{60}$

Reverberation Time is the time it takes for a sound to decay by 60 dB and is sometimes abbreviated T60 or RT60. A T60 of less than 0.5 seconds is ideal for good speech clarity.

##### Sabins Decay Formula

| | |
|---|---|
| $$w(t) = w_{0} e^{\frac{-cAt}{4V}} for t > 0 $$ | **[24]** |
| $$T = \frac{24ln10}{c} \frac{V}{A} $$ | **[25]** |
| $$T = 0.163 \frac{V}{A} $$ | **[26]** |
| $$T = \frac{V}{6A} $$ | **[27]** |

##### Eyring's formula

| | |
|---|---|
| $$w(t) = w_{0} e^{\frac{-cSt}{4V} ln(1-alpha) -mct} $$ | **[28]** |
| $$T = 0.163 \frac{V}{4mV -S ln(1-\alpha)} $$ | **[29]** |
| $$or \alpha<<1, ln(1-\alpha) = -\alpha $$ is Sabin | **[30]** |

![alternatvie text](..\images\Fig-8-9.png)


#### Early Decay Time EDT

According to ISO 3382, Early Decay Time (EDT) is an acoustic parameter which is more relative to perceived reverberance and is actually affected by the very early reflections. In this way EDT becomes an additional and useful method for characterising and optimising a room acoustics simulation.

| | |
|---|---|
| $$ EDT = 6t_{0,-10} [s] $$ | **[31]** |

![alternatvie text](..\images\Fig-8-10.png)

#### Reverberation time $T_{20}$

It can be difficult to put enough sound into a room to fully measure RT60 directly, so we often extrapolate it using just a portion of the decay.  If the time for the sound pressure level to decay by 20 dB is measured and multiplied by 3, we call our reverberation time a T20 measurement.

| | |
|---|---|
| $$ T_{20} = 3t_{(-5,-25)} [s] $$ | **[32]** |

![alternatvie text](..\images\Fig-8-11.png)

#### Reverberation time $T_{30}$

If we measure the time for the sound pressure level to decay by 30 dB and multiply by 2, this is called a T30 measurement. In both cases ($T_20$ or $T_30$), the measurement is begun after the first 5 dB of decay.

| | |
|---|---|
| $$ T_{30} = 2t_{(-5,-25)} [s] $$ | **[33]** |

![alternatvie text](..\images\Fig-8-12.png)


#### Clarity $C_{80}$

The early to late energy ratio in dB, using sound energy in the first 80 ms as the 'early' part. C80 is most often used as an indicator of music clarity.

| | |
|---|---|
| $$ D_{50} = \frac{\int_0^{0.050s} p^2 (t) dt }{\int_0^\infty p^2 (t)dt } [-] $$ | **[34]** |


![alternatvie text](..\images\Fig-8-13.png)
![alternatvie text](..\images\Fig-8-14.png)

#### Definition $D_{50}$

The early to total energy ratio as a percentage, using sound energy in the first 50 ms as the 'early' part.

| | |
|---|---|
| $$ C_{80} = 10lg \frac{\int_0^{0.080s} p^2 (t) dt }{\int_{0.080s}^\infty p^2 (t)dt } [dB] $$ | **[35]** |

#### Centre Time

Centre time, noted tS (in s), is the centre of gravity of the quadratic IR (NF EN ISO 3382-1:2010):

| | |
|---|---|
| $$ T_{s} = \frac{\int_{0}^{\infty} t p^2 (t) dt }{\int_{0}^\infty p^2 (t)dt } [s] $$ | **[36]** |

#### Early Lateral Energy LF

Early lateral energy fraction as a measure of SI Spatial Impression is generally considered to be comprised of two basic perceptual
components; these are 1) auditory source width (ASW), a function of the “early” sound field, and 2) listener envelopment (LEV), a function of the “late” sound field. This discussion will address only the effects of the “early” sound field, integrated to 80 msec, on the subjective SI of the space, and SI will be used here to denote the effect of source broadening. 

| | |
|---|---|
| $$ LF = \frac{\int_{0}^{0.080s} p_{\Phi}^2 (t) dt }{\int_{0}^{0.080s} p_{\Theta}^2 (t)dt } [s] $$ | **[37]** |

#### Early Lateral Energy LFC

LFC or Lateral Fraction Cosine is explain through the equation 38.

| | |
|---|---|
| $$ LFC = \frac{\int_{0.005s}^{0.080s} \mid P_{L} p(t) \mid dt }{\int_{0}^{0.080s} p^2 (t)dt } [-] $$ | **[38]** |

where the impulse responses are the same as those for LF. This provides an approximation of a weighting of lateral reflections according to the cosine of the angle of incidence, which is thought to be better correlated with the subjective impression than the cosine-squared weighting of the LF.

#### Interaural Crosscorrelation IACC

The measure of the difference in signals received by two ears of a person. IACC values range from -1 to +1. A value of -1 means the signals are identical, but completely out of phase. +1 means they are identical, and 0 means they have no correlation at all. The IACC will be nearly +1 for mono sources directly in front of or behind the listener, with lower values if the source is off to one side. IACC values are of interest to acousticians as a number of researches have found that large IACC values correspond to greater degrees of envelopment and overall more enjoyable listening experience in auditoriums.

| | |
|---|---|
| $$ IACF_{t1,t2} = \frac{\int_{t1}^{t2} \mid P_L(t) P_R(t + \tau) \mid dt }{\sqrt{\int{_{t1}^{t2} P_L^2(t)dt} \int{_{t1}^{t2} P_R^2(t)dt }}} [s] $$ | **[39]** |
| $$ IACC_{t1,t2} = max \mid IACF_{t1,t2}(\tau) \mid  $$ | **[40]** |
| $$ \tau > -1ms $$ | $$ \tau < 1ms $$  |


#### Speech Intelligibility

In terms of acoustics, speech intelligibility is a well-defined concept which indicates how well speech is perceived in a room – either directly with a speaker and a number of listeners, or via a sound system with a microphone, amplifier and speaker(s). Traditionally, speech intelligibility has simply been measured by listing how many words listeners have heard correctly in a text which is read out loud by an articulate speaker in the room in question. If the listeners, for example, hear 60 per cent of the words correctly, the speech intelligibility is 60 per cent or 0.6. Speech intelligibility as a concept was created in the early days of telephony in the 1920s in order to establish a target for telephone quality. It is not hard to imagine that this process can be very time-consuming, and therefore considerable efforts have been made over the years to calculate or directly measure speech intelligibility in a given situation, both in a real room such as a lecture hall, or via a PA system, for example at a railway station.

| | |
|---|---|
| $$ m(F) = \frac{\int_{-\infty}^{\infty} p^2 (t) e^{-j2\pi Ft} dt }{\int_{-\infty}^{\infty} p^2 (t)dt } [-] $$ | **[41]** |

### 7. Speech transmission Index

#### Signal modulation reduction



![alternatvie text](..\images\Fig-8-15.png)

![alternatvie text](..\images\Fig-8-16.png)



#### Modulation reduction

![alternatvie text](..\images\Fig-8-18.png)

| | |
|---|---|
| $$ m = \frac{A}{I_g} = 1 $$ | **[42]** |


![alternatvie text](..\images\Fig-8-19.png)

| | |
|---|---|
| $$ m = \frac{A}{I_g} < 1 $$ | **[43]** |

![alternatvie text](..\images\Fig-8-20.png)

| | |
|---|---|
| $$ m = \frac{A_i}{A_0} = 1 $$ | **[44]** |

![alternatvie text](..\images\Fig-8-21.png)

| | |
|---|---|
| $$ m = \frac{A_i}{A_0 + n} < 1 $$ | **[45]** |


### 8. Speech Intelligibility

#### Speech Transmission Index: STI

Speech Transmission Index (STI) is a measure of speech transmission quality. The absolute measurement of speech intelligibility is a complex science. The STI measures some physical characteristics of a transmission channel (a room, electro-acoustic equipment, telephone line, etc.), and expresses the ability of the channel to carry across the characteristics of a speech signal. STI is a well-established objective measurement predictor of how the characteristics of the transmission channel affect speech intelligibility.

The influence that a transmission channel has on speech intelligibility is dependent on:

- the speech level
- frequency response of the channel
- non-linear distortions
- background noise level
- quality of the sound reproduction equipment
- echos (reflections with delay > 100ms)
- the reverberation time
- psychoacoustic effects (masking effects)

| | |
|---|---|
|  $$ 0 < STI < 1 $$  |   |
| $$ m(F) = \frac{\int_{-\infty}^{\infty} p^2 (t) e^{-j2\pi Ft} dt }{\int_{-\infty}^{\infty} p^2 (t)dt } [-] $$ | **[41]** |


![alternatvie text](..\images\Fig-8-17.png)

### References

1. Kuttruff, Acoustics: An Introduction; Taylor and Francis , 2007
2. Impulse Responses. (s.f.). REW - Room EQ Wizard Room Acoustics Software. [Link](https://www.roomeqwizard.com/help/help_en-GB/html/impulseresponse.html)
3. Waterfall Plots. (s.f.). Liberty Instruments Home Page. [Link](http://www.libinst.com/wattlar.htm)
4. Spectrogram. (s.f.). Pacific Northwest Seismic Network. [Link](https://pnsn.org/spectrograms/what-is-a-spectrogram#:~:text=A%20spectrogram%20is%20a%20visual,energy%20levels%20vary%20over%20time).
5. Omnidirectional microphone vs unidirectional: how many do you need to know? (s.f.). SYNCO. [Link](https://www.syncoaudio.com/blogs/news/omnidirectional-microphone-vs-unidirectional#:~:text=Omnidirectional%20microphone%20definition&amp;text=The%20omni%20microphone%20is%20always,earsets,%20are%20usually%20omnidirectional%20mics).
6. Arthur. (2019, 25 de junio). What Is A Hypercardioid Microphone? (Polar Pattern + Mic Examples) | My New Microphone. My New Microphone. [Link](https://mynewmicrophone.com/what-is-a-hypercardioid-microphone-polar-pattern-mic-examples/)
7. DPA Microphones. (s.f.). DPA. [Link](https://www.dpamicrophones.com/mic-dictionary/cardioid-microphone#:~:text=A%20cardioid%20microphone%20is%20characterized,is%20at%20±90°).
8. Arthur. (2019, 25 de junio). What Is A Supercardioid Microphone? (Polar Pattern + Mic Examples) | My New Microphone. My New Microphone. [Link](https://mynewmicrophone.com/what-is-a-supercardioid-microphone-polar-pattern-mic-examples/)
9. What are Unidirectional Microphones? (s.f.). Learning about Electronics. [Link](http://www.learningaboutelectronics.com/Articles/What-are-unidirectional-microphones)
10. What is sound intensity? | auersignal.com. (s.f.). Auer Signal - sichere Signaltechnik | auersignal.com. [Link](https://www.auersignal.com/en/technical-information/audible-signalling-equipment/sound-intensity/#:~:text=The%20unit%20of%20intensity%20of,pressure%20level%20or%20sound%20level).
11. Reverberation Time Definition. [Link](https://www.soundassured.com/blogs/blog/reverberation-time-calculator-and-definition#:~:text=Reverberation%20Time%20is%20the%20time,ideal%20for%20good%20speech%20clarity).
12. The control of early decay time on auralization results based on geometric acoustic modelling - White Rose Research Online. (s.f.). White Rose Research Online. [Link](https://eprints.whiterose.ac.uk/75128/#:~:text=According%20to%20ISO%203382,%20Early,optimising%20a%20room%20acoustics%20simulation).
13. Reverberation Time in Room Acoustics. (s.f.). Larson Davis. [Link](http://www.larsondavis.com/learn/building-acoustics/Reverberation-Time-in-Room-Acoustics#:~:text=If%20the%20time%20for%20the,is%20called%20a%20T30%20measurement).
14. Clarity Graph. (s.f.). REW - Room EQ Wizard Room Acoustics Software. [Link](https://www.roomeqwizard.com/help/help_en-GB/html/graph_clarity.html#:~:text=Clarity%20C80,ms%20as%20the%20'early'%20part)
15. Room acoustics parameters — Documentation I-Simpa 1.3.4. (s.f.). I-Simpa User Guide — Documentation I-Simpa 1.3.4. [Link](https://i-simpa-wiki.readthedocs.io/fr/latest/room_acoustics_parameters.html)
16. Speech intelligibility and speech intelligibility goals. (s.f.). Better acoustics with Troldtekt acoustic panels. [Link](https://www.troldtekt.com/knowledge/good-acoustics/advanced-acoustics/speech-intelligibility-and-speech-intelligibility-goals/#:~:text=In%20terms%20of%20acoustics,%20speech,amplifier%20and%20speaker(s)).
17. Interaural Cross Correlation. [Link](https://www.sweetwater.com/insync/interaural-cross-correlation-iacc/#:~:text=The%20measure%20of%20the%20difference,but%20completely%20out%20of%20phase).
18. Contributors to Wikimedia projects. (2005, 11 de agosto). Speech transmission index - Wikipedia. Wikipedia, the free encyclopedia. [Link](https://en.wikipedia.org/wiki/Speech_transmission_index)