Skip to content

Latest commit

 

History

History
146 lines (123 loc) · 6.85 KB

tutorial-auditory-model.txt

File metadata and controls

146 lines (123 loc) · 6.85 KB

Matlab

Setting up an auditory model

From other auditory modeling frameworks like the you are most probably used to have one function for an auditory model. In addition you normally first create your complete binaural signal which is then fed into the function. For example, in the following would return a localisation probability distribution for the binaural input signal:

>> out = may2011(input,fs);

In the things work differently as the model has the ability to interact with the scene. This means that we have no fixed binaural signal, but only a description of the acoustic scene. For an introduction see sec-tutorial-acoustic-scene. The binaural signal is then created on the fly in a block-based manner during the processing of the auditory model and can be changed if the model decides to interact with the scene, for example by rotating its head. Another difference is that the does not come with different functions for different tasks like localisation, speaker identification and so on. It always has its central brain the in which sec-knowledge-sources are available for different tasks. In the end it is planned that the resolves by itself which knowledge sources are needed in order to solve the given task, at the moment the user has to define which knowledge sources should be active.

Note

If you don't need the brain unit of the model, but are only interested in extracting whatever auditory cues from your input signals or simulated acoustic scene, you can use the as a standalone part that extracts auditory cues for you. This is further explained in its sec-afe-overview section.

In the following we explain this by a simple example. Let's assume we have the acoustic scene definition from the sec-binaural-renderer tutorial and it is stored in the file binaural_renderer.xml. Now we want to configure the to localise the sound source in that scene. As described above we have to configure the to do so. The configuration is also done via an XML file, which would look like this for the localisation task. The file is called blackboard.xml and can be found in the examples/first_steps/setting_up_an_auditory_model folder:

<?xml version="1.0" encoding="utf-8"?>
<blackboardsystem>

    <dataConnection Type="AuditoryFrontEndKS"/>

    <KS Name="loc" Type="GmmLocationKS"/>
    <KS Name="conf" Type="ConfusionKS"/>
    <KS Name="confSolv" Type="ConfusionSolvingKS"/>
    <KS Name="rot" Type="RotationKS">
        <Param Type="ref">robotConnect</Param>
    </KS>

    <Connection Mode="replaceOld" Event="AgendaEmpty">
        <source>scheduler</source>
        <sink>dataConnect</sink>
    </Connection>
    <Connection Mode="replaceOld">
        <source>dataConnect</source>
        <sink>loc</sink>
    </Connection>
    <Connection Mode="add">
        <source>loc</source>
        <sink>conf</sink>
    </Connection>
    <Connection Mode="replaceOld" Event="ConfusedLocations">
        <source>conf</source>
        <sink>rot</sink>
    </Connection>
    <Connection Mode="add" Event="ConfusedLocations">
        <source>conf</source>
        <sink>confSolv</sink>
    </Connection>

</blackboardsystem>

It looks a little complicated, so let us go through it step by step. First the needs some data it can work on. This data has the form of auditory cues which are extracted by the . There exists a special knowledge source to get the data into the blackboard, which is handled in line 4. After that in lines 6-11 four other knowledge sources are included for our actual localisation task. Some of them come with parameters like robotConnect for the which defines the connection to the system where the head can actually be rotated. It first starts with the and the other knowledge sources should then interpret its probability distribution data and see if they are able to estimate a single location from it. If not, decides to turn the head of the listener in order to get a better estimation. In order to allow for the communication between the knowledge sources all the <Connection> tags are used. For a detailed explanation on it have a look at sec-blackboard-configure.

Now after having configured the acoustic scene in binaural_renderer.xml and the blackboard in blackboard.xml we could finally run the model with:

% Initialise binaural simulator
sim = simulator.SimulatorConvexRoom('binaural_renderer.xml');
sim.set('Init', true);
% Initialise blackboard and connect to binaural simulation
bbs = BlackboardSystem(0);
bbs.setRobotConnect(sim);
bbs.buildFromXml('blackboard.xml');
% Run model
bbs.run();

Now the model is finished, but we are of course interested in seeing the results. What you could do here is the following:

>> sourceAzimuth = 63; % real phyiscal position of the source
>> perceivedAzimuths = bbs.blackboard.getData('perceivedAzimuths');
>> displayLocalisationResults(perceivedAzimuths, sourceAzimuth);

------------------------------------------------------------------------------------
Reference target angle:  63 degrees
------------------------------------------------------------------------------------
Localised source angle:
BlockTime   PerceivedAzimuth    (head orient., relative azimuth)    Probability
------------------------------------------------------------------------------------
  0.56       65 degrees     (  0 degrees,     65 degrees)   0.52
  1.02       65 degrees     ( 20 degrees,     45 degrees)   0.55
  1.58      235 degrees     ( 40 degrees,    195 degrees)   0.35
  2.04       60 degrees     ( 60 degrees,      0 degrees)   0.38
  2.51      235 degrees     ( 80 degrees,    155 degrees)   0.42
------------------------------------------------------------------------------------
Mean localisation error: 11.7065
------------------------------------------------------------------------------------

Here you see for several steps in time the predicted location together with the actual head orientation of the listener and the probability of the predicted location. If the probability is below a given threshold the model decides to turn its head towards the most likely source position.

For details on how to view more results, like the extracted binaural cues, have a look at sec-examples-localisation-details. And for more details on localisation and on how to switch off the head rotations of the model, see sec-examples-localisation.