Matlab
In this section we describe knowledge sources that are currently available in the , those are:
The abstract knowledge source (AbstractKS
class) is the base class for all knowledge sources in the blackboard system. The corresponding file AbstractKS.m
is located in the src/blackboard_core
directory as opposed to all implementations of actual knowledge sources that are located in the directory src/knowledge_sources
. The listing below shows the parts most relevant to development of new knowledge sources:
class AbstractKS
properties
blackboard;
blackboardSystem;
invocationMaxFrequency_Hz;
trigger;
events
KsFiredEvent
methods (Abstract)
canExecute()
execute()
methods
focus()
unfocus()
There are different aspects of functionality in this interface.
- Data access
Knowledge sources have a handle to the
blackboard
. Through this handle, data can be placed on and retrieved from the blackboard.- System setup
Through the handle
blackboardSystem
, knowledge sources get access to the methods for adding and removing other knowledge sources, and also access to theBlackboardMonitor
.- Execution properties
The property
invocationMaxFrequency_Hz
specifies how often this knowledge source is allowed to be executed. The methodsfocus
andunfocus
give access to the attentional priority of the knowledge source, which influences its relative importance when competing for computing resources with other knowledge sources. See Sectionsec-scheduler
for a description of scheduling.- Execution conditional
The abstract method
canExecute
must be implemented by the inheriting knowledge source. It is called by the scheduler when the knowledge source is next in the schedule before actually executing. If this method returns false, execution will not be performed. The second output argument of this method indicates whether the knowledge source should remain in the agenda or be removed.- Execution
The main functionality of any knowledge source is implemented in the method
execute
. A knowledge source gets executed by the scheduler if its maximum invocation frequency would not be exceeded and itscanExecute
method returns true. In this method, a knowledge source gets access to itstrigger
, a structure that contains information about the triggering event, the triggering source, and an argument the trigger source placed for usage by sinks.- Events
Knowledge sources can define their own individual events. However, each class already inherits a standard event from
AbstractKS
,KsFiredEvent
. Events can be triggered by knowledge sources viaobj.notify(eventname, attachedData)
.
This knowledge source integrates the into the . The itself is a self-contained module and this section focuses on its integration within the framework.
The knowledge source is connected to the blackboard and the robot interface by registering itself in the system via BlackboardSystem.setDataConnect
. Upon construction, the dataObject
and managerObject
are instantiated and connected to the robot interface ear signals stream. The maximum invocation frequency of the AuditoryFrontEndKS
is set to infinity. Execution mainly consists of getting the latest chunk of ear signals data, processing it through the , and notifying a KsFiredEvent
.
Other knowledge sources can register requests with the indirectly, through inheriting from the AuditoryFrontEndDepKS
class (see Section sec-afe-dep-knowledge-source
, and binding to it's KsFiredEvent
.
Whenever a knowledge source needs signals, cues or features from the auditory front-end, it should subclass from the AuditoryFrontEndDepKS
class. Any knowledge source added to the blackboard through the BlackboardSystem
addKS
or createKS
methods, register these requests automatically with the .
- Setting up the requests
Inheriting knowledge sources need to put their requests in their call to the super-constructor:
obj@AuditoryFrontEndDepKS(requests)
, withrequests
being a cell array of structures each with fieldsname
, stating the requested signal name, andparams
, specifying the signal parameters. (Have a look atsec-afe-processors
.)An example looks like this:
requests{1}.name = 'modulation'; requests{1}.params = genParStruct( ... 'nChannels', obj.amFreqChannels, ... 'am_type', 'filter', ... 'am_nFilters', obj.amChannels ... ); requests{2}.name = 'ratemap_magnitude'; requests{2}.params = genParStruct( ... 'nChannels', obj.freqChannels ... );
The
params
field always needs to be populated by a call to thegenParStruct
method.- Accessing signals
These requested signals can then be accessed by the knowledge source via the inherited
getAFEdata
method, which returns a map (with the indexes as in the request structure being the keys) of handles to the actual signals.An example, according to the requests example above, looks like this:
afeData = obj.getAFEdata(); modSobj = afeData(1); rmSobj = afeData(2); rmBlock = rmSobj.getSignalBlock(0.5,0);
A more elaborate description of the request parameter structure and the signal objects can be found in the help for the Two!Ears Auditory Front-End
<../afe/index>
. Have a look at the implementation of the GmtkLocationKS to see a real-world example of how to subclass AuditoryFrontEndDepKS
.
Four knowledge sources work together to generate hypotheses of sound source azimuths: Location
knowledge source, Confusion Detection
knowledge source, Confusion Solving
knowledge source, and Head Rotation
knowledge source.
Class DnnLocationKS
implements knowledge about the statistical relationship between spatial cues and azimuth locations using s. Currently the s are trained on binaural cues from the including and cues, as described in more details in [MaEtAl2015dnn].
This knowledge source requires signals from the and thus inherits from the AuditoryFrontEndDepKS
(Section sec-afe-dep-knowledge-source
) and needs to be bound to the AuditoryFrontEndKS
’s KsFiredEvent
. The canExecute
precondition checks the energy level of the current signal block and localisation takes place only if there is an actual auditory event. After execution, a SourcesAzimuthsDistributionHypothesis
containing a probability distribution of azimuth locations is placed on the blackboard (category sourcesAzimuthsDistributionHypotheses
) and the event KsFiredEvent
is notified.
binds to | AuditoryFrontEndKS.KsFiredEvent |
writes data category | sourcesAzimuthsDistributionHypotheses |
triggers event | KsFiredEvent |
Class GmtkLocationKS
implements knowledge about the statistical relationship between spatial cues and azimuth locations. Currently we model the relationship using s, which are trained on binaural cues from the including and cues.
This knowledge source requires signals from the and thus inherits from the AuditoryFrontEndDepKS
(Section sec-afe-dep-knowledge-source
) and needs to be bound to the AuditoryFrontEndKS
’s KsFiredEvent
. The canExecute
precondition checks the energy level of the current signal block and localisation takes place only if there is an actual auditory event. After execution, a SourcesAzimuthsDistributionHypothesis
containing a probability distribution of azimuth locations is placed on the blackboard (category sourcesAzimuthsDistributionHypotheses
) and the event KsFiredEvent
is notified.
binds to | AuditoryFrontEndKS.KsFiredEvent |
writes data category | sourcesAzimuthsDistributionHypotheses |
triggers event | KsFiredEvent |
The ConfusionKS
checks new location hypotheses and decides whether there is a confusion. A confusion emerges when there are more valid locations in the hypotheses than assumed auditory sources in the scene. In case of a confusion, a ConfusedLocations
event is notified and the responsible location hypothesis is placed on the blackboard in the confusionHypotheses
category. Otherwise, a PerceivedAzimuth
object is added to the blackboard perceivedAzimuths
data category, and the standard event is triggered.
binds to | GmtkLocationKS.KsFiredEvent |
reads data category | sourcesAzimuthsDistributionHypotheses |
writes data category | confusionHyptheses or perceivedAzimuths |
triggers event | ConfusedLocations or KsFiredEvent |
The ConfusionSolvingKS
solves localisation confusions by predicting the location probability distribution after head rotation, and comparing it with new location hypotheses received after head rotation is completed. The canExecute
method will wait for new location hypotheses; when there is one, it will check whether the head has been turned, otherwise it will not execute. The confusion is then solved by using the old and the new location hypothesis, and a PerceivedAzimuth
object is placed on the blackboard.
binds to | ConfusionDetectionKS.ConfusedLocations |
reads data category | confusionHypotheses , headOrientation and sourcesAzimuthsDistributionHypotheses |
writes data category | perceivedAzimuths |
triggers event | KsFiredEvent |
The RotationKS
has knowledge on how to move the robotic head in order to solve confusions in source localisation. If there is no other head rotation already scheduled, the knowledge source uses the robot
interface to turn the head.
binds to | ConfusionKS.ConfusedLocations |
reads data category | confusionHypotheses , headOrientation |
writes data category | headOrientation |
This section focuses on implementation of sound identification knowledge sources within the blackboard framework.
Objects of class IdentityKS
implement source type models, by incorporating an instance of a model (which has to implement the models.Base interface) with knowledge about the relationship of auditory cues and certain sound source types. Many Identity knowledge sources can be used concurrently; usually, for each sound class to be identified, you would instantiate an object of class IdentityKS
with the respective model. The models get loaded from directories you specify upon construction, and should be created with the sec-idTrainPipeline
. The model object of IdentityKS
can employ any kind of model, such as a linear support vector machine, or a Gaussian mixture model. The IdentityKS
needs access to signals, thus it is a subclass of AuditoryFrontEndDepKS
(see Section sec-afe-dep-knowledge-source
). The model object holds the signal request structure.
The knowledge source predicts, based on the incorporated source model, whether the currently received auditory stream includes an auditory object of the sound type it represents.
binds to | AuditoryFrontEndKS.KsFiredEvent |
writes data category | identityHypotheses |
triggers event | KsFiredEvent |
Have a look at the example sec-examples-identification
to see IdentityKS
in action.
The identity knowledge source checks new identity hypotheses. It then decides which of them are valid, by comparison and incorporating knowledge about the number of assumed auditory objects in the scene.
binds to | IdentityKS.KsFiredEvent |
reads data category | identityHypotheses |
writes data category | identityDecision |
triggers event | KsFiredEvent |
This is not really a knowledge source in the sense of the word, but rather a way to enable live-inspection of the identity information in the blackboard system. Upon construction, it takes ground truth information about event labels, onset and offset times; and when triggered, displays this in comparison with the actual hypotheses created by the identity knowledge sources. This figure
<fig-identify>
shows an example of the produced plot.
binds to | IdentityKS.KsFiredEvent |
reads data category | identityHypotheses |
The class ColorationKS
implements the prediction of the perceived change in timbre of an auditory event compared to a reference. The reference is nothing fixed, but can be learned and is stored inside the blackboard memory. At the moment the learning is implemented in a very low level fashion: the first signal the is confronted with is learned as the reference and all later signals are compared to that reference. The colorationHypotheses
is then a value between 0 and 1 -- whereas it is not hard limited and can be larger than 1 for some conditions. The actual value is calculated using the naturalness model from [MooreTan2004] which compares the weighted excitation patterns of the reference and the test stimulus.
binds to | AuditoryFrontEndKS.KsFiredEvent |
reads data category | colorationReference |
writes data category | colorationHypotheses or colorationReference |
triggers event | KsFiredEvent |
As the current implementation of GmtkLocationKS
and DnnLocationKS
are not able to predict the localisation reliable under difficult conditions, we introduced a different location knowledge source as an intermediate solution. This ItdLocationKS
is optimised for the prediction of predicting the perceived direction of a sound source created by a spatial audio system. It uses only cues under 1400 Hz and utilises a lookup table to match those values to the corresponding angles. This implies that the knowledge source is not able to distinguish between front and back. Beside the lookup table it uses also an outlier detection in the process of integrating the perceived angles over the different frequency channels as suggested in [Wierstorf2014]. The output of the knowledge source is a sourcesAzimuthsDistributionHypotheses
identical to the output of the GmtkLocationKS
or DnnLocationKS
.
binds to | AuditoryFrontEndKS.KsFiredEvent |
writes data category | sourcesAzimuthsDistributionHypotheses |
triggers event | KsFiredEvent |
This section focuses on implementation of knowledge sources for the segmentation of auditory features within the blackboard framework.
The segmentation knowledge source generates hypotheses about the assignment of individual time-frequency units to sound sources present in a scene. This assignment is done probabilistically, hence, each time-frequency unit is associated with a unique discrete probability distribution. These distributions can be interpreted as soft-masks which can be used to generate segmented auditory features. Specifically, each auditory feature that can be represented in the time-frequency domain can be modified accordingly by a corresponding soft-mask. The soft-masks are generated by a probabilistic clustering approach based on a mixture of von Mises distributions over estimated angular positions of the sound sources. These positions can either be estimated by the SegmentationKS
itself or provided by a SourcesAzimuthsDistributionHypothesis
on the blackboard. If not all source positions can be reliably estimated by the DnnLocationKS
, the remaining positions are estimated during the segmentation process. All estimated positions are subsumed with corresponding circular uncertainties in a sourceAzimuthHypotheses
object for each sound source. Additionally, the estimated soft-masks are stored in a sound source specific segmentationHypotheses
object. Each sourceAzimuthHypotheses
and segmentationHypotheses
contains a unique source identifier tag, enabling other knowledge sources to assign each soft-mask with the corresponding source position. The current implementation of the SegmentationKS
relies on a pre-defined number of sound sources that will be present in the scene.
binds to | AuditoryFrontEndKS.KsFiredEvent |
reads data category | sourcesAzimuthsDistributionHypotheses |
writes data category | sourceAzimuthHypotheses and segmentationHypotheses |
triggers event | KsFiredEvent |
This knowledge source is obsolete and will be removed in a later release.
For the following knowledge sources skeleton files are already existing, but its functionality is not implemented yet.
This knowledge source will generate a hypothesis about the number of sound sources present in the auditory scene.
- MaEtAl2015dnn
Ma, N., Brown, G. J. and May, T. (2015) Robust localisation of of multiple speakers exploiting deep neural networks and head movements. Proceedings of Interspeech'15, pp.3302-3306, Dresden, Germany
- MooreTan2004
Moore, B. C. J., & Tan, C. (2004) Development and Validation of a Method for Predicting the Perceived Naturalness of Sounds Subjected to Spectral Distortion. JAES, 52(9), 900–14.
- Wierstorf2014
Wierstorf, H. (2014) “Perceptual Assessment of Sound Field Synthesis,” PhD-thesis, TU Berlin