Skip to content

Latest commit

 

History

History
504 lines (397 loc) · 24.6 KB

knowledge-sources.txt

File metadata and controls

504 lines (397 loc) · 24.6 KB

Matlab

Knowledge sources

In this section we describe knowledge sources that are currently available in the , those are:

Abstract knowledge source

The abstract knowledge source (AbstractKS class) is the base class for all knowledge sources in the blackboard system. The corresponding file AbstractKS.m is located in the src/blackboard_core directory as opposed to all implementations of actual knowledge sources that are located in the directory src/knowledge_sources. The listing below shows the parts most relevant to development of new knowledge sources:

class AbstractKS
    properties
        blackboard;
        blackboardSystem;
        invocationMaxFrequency_Hz;
        trigger;
    events
        KsFiredEvent
    methods (Abstract)
        canExecute()
        execute()
    methods
        focus()
        unfocus()

There are different aspects of functionality in this interface.

Data access

Knowledge sources have a handle to the blackboard. Through this handle, data can be placed on and retrieved from the blackboard.

System setup

Through the handle blackboardSystem, knowledge sources get access to the methods for adding and removing other knowledge sources, and also access to the BlackboardMonitor.

Execution properties

The property invocationMaxFrequency_Hz specifies how often this knowledge source is allowed to be executed. The methods focus and unfocus give access to the attentional priority of the knowledge source, which influences its relative importance when competing for computing resources with other knowledge sources. See Section sec-scheduler for a description of scheduling.

Execution conditional

The abstract method canExecute must be implemented by the inheriting knowledge source. It is called by the scheduler when the knowledge source is next in the schedule before actually executing. If this method returns false, execution will not be performed. The second output argument of this method indicates whether the knowledge source should remain in the agenda or be removed.

Execution

The main functionality of any knowledge source is implemented in the method execute. A knowledge source gets executed by the scheduler if its maximum invocation frequency would not be exceeded and its canExecute method returns true. In this method, a knowledge source gets access to its trigger, a structure that contains information about the triggering event, the triggering source, and an argument the trigger source placed for usage by sinks.

Events

Knowledge sources can define their own individual events. However, each class already inherits a standard event from AbstractKS, KsFiredEvent. Events can be triggered by knowledge sources via obj.notify(eventname, attachedData).

Auditory front-end knowledge source: AuditoryFrontEndKS

This knowledge source integrates the into the . The itself is a self-contained module and this section focuses on its integration within the framework.

The knowledge source is connected to the blackboard and the robot interface by registering itself in the system via BlackboardSystem.setDataConnect. Upon construction, the dataObject and managerObject are instantiated and connected to the robot interface ear signals stream. The maximum invocation frequency of the AuditoryFrontEndKS is set to infinity. Execution mainly consists of getting the latest chunk of ear signals data, processing it through the , and notifying a KsFiredEvent.

Other knowledge sources can register requests with the indirectly, through inheriting from the AuditoryFrontEndDepKS class (see Section sec-afe-dep-knowledge-source, and binding to it's KsFiredEvent.

Auditory signal dependent knowledge source superclass: AuditoryFrontEndDepKS

Whenever a knowledge source needs signals, cues or features from the auditory front-end, it should subclass from the AuditoryFrontEndDepKS class. Any knowledge source added to the blackboard through the BlackboardSystem addKS or createKS methods, register these requests automatically with the .

Setting up the requests

Inheriting knowledge sources need to put their requests in their call to the super-constructor: obj@AuditoryFrontEndDepKS(requests), with requests being a cell array of structures each with fields name, stating the requested signal name, and params, specifying the signal parameters. (Have a look at sec-afe-processors.)

An example looks like this:

requests{1}.name = 'modulation';
requests{1}.params = genParStruct( ...
   'nChannels', obj.amFreqChannels, ...
   'am_type', 'filter', ...
   'am_nFilters', obj.amChannels ...
   );
requests{2}.name = 'ratemap_magnitude';
requests{2}.params = genParStruct( ...
   'nChannels', obj.freqChannels ...
   );

The params field always needs to be populated by a call to the genParStruct method.

Accessing signals

These requested signals can then be accessed by the knowledge source via the inherited getAFEdata method, which returns a map (with the indexes as in the request structure being the keys) of handles to the actual signals.

An example, according to the requests example above, looks like this:

afeData = obj.getAFEdata();
modSobj = afeData(1);
rmSobj = afeData(2);
rmBlock = rmSobj.getSignalBlock(0.5,0);

A more elaborate description of the request parameter structure and the signal objects can be found in the help for the Two!Ears Auditory Front-End <../afe/index>. Have a look at the implementation of the GmtkLocationKS to see a real-world example of how to subclass AuditoryFrontEndDepKS.

Localisation knowledge sources

Four knowledge sources work together to generate hypotheses of sound source azimuths: Location knowledge source, Confusion Detection knowledge source, Confusion Solving knowledge source, and Head Rotation knowledge source.

Location knowledge source: DnnLocationKS

Class DnnLocationKS implements knowledge about the statistical relationship between spatial cues and azimuth locations using s. Currently the s are trained on binaural cues from the including and cues, as described in more details in [MaEtAl2015dnn].

This knowledge source requires signals from the and thus inherits from the AuditoryFrontEndDepKS (Section sec-afe-dep-knowledge-source) and needs to be bound to the AuditoryFrontEndKS’s KsFiredEvent. The canExecute precondition checks the energy level of the current signal block and localisation takes place only if there is an actual auditory event. After execution, a SourcesAzimuthsDistributionHypothesis containing a probability distribution of azimuth locations is placed on the blackboard (category sourcesAzimuthsDistributionHypotheses) and the event KsFiredEvent is notified.

binds to AuditoryFrontEndKS.KsFiredEvent
writes data category sourcesAzimuthsDistributionHypotheses
triggers event KsFiredEvent

Location knowledge source: GmtkLocationKS

Class GmtkLocationKS implements knowledge about the statistical relationship between spatial cues and azimuth locations. Currently we model the relationship using s, which are trained on binaural cues from the including and cues.

This knowledge source requires signals from the and thus inherits from the AuditoryFrontEndDepKS (Section sec-afe-dep-knowledge-source) and needs to be bound to the AuditoryFrontEndKS’s KsFiredEvent. The canExecute precondition checks the energy level of the current signal block and localisation takes place only if there is an actual auditory event. After execution, a SourcesAzimuthsDistributionHypothesis containing a probability distribution of azimuth locations is placed on the blackboard (category sourcesAzimuthsDistributionHypotheses) and the event KsFiredEvent is notified.

binds to AuditoryFrontEndKS.KsFiredEvent
writes data category sourcesAzimuthsDistributionHypotheses
triggers event KsFiredEvent

Confusion detection knowledge source: ConfusionKS

The ConfusionKS checks new location hypotheses and decides whether there is a confusion. A confusion emerges when there are more valid locations in the hypotheses than assumed auditory sources in the scene. In case of a confusion, a ConfusedLocations event is notified and the responsible location hypothesis is placed on the blackboard in the confusionHypotheses category. Otherwise, a PerceivedAzimuth object is added to the blackboard perceivedAzimuths data category, and the standard event is triggered.

binds to GmtkLocationKS.KsFiredEvent
reads data category sourcesAzimuthsDistributionHypotheses
writes data category confusionHyptheses or perceivedAzimuths
triggers event ConfusedLocations or KsFiredEvent

Confusion solving knowledge source: ConfusionSolvingKS

The ConfusionSolvingKS solves localisation confusions by predicting the location probability distribution after head rotation, and comparing it with new location hypotheses received after head rotation is completed. The canExecute method will wait for new location hypotheses; when there is one, it will check whether the head has been turned, otherwise it will not execute. The confusion is then solved by using the old and the new location hypothesis, and a PerceivedAzimuth object is placed on the blackboard.

binds to ConfusionDetectionKS.ConfusedLocations
reads data category confusionHypotheses, headOrientation and sourcesAzimuthsDistributionHypotheses
writes data category perceivedAzimuths
triggers event KsFiredEvent

Head rotation knowledge source: RotationKS

The RotationKS has knowledge on how to move the robotic head in order to solve confusions in source localisation. If there is no other head rotation already scheduled, the knowledge source uses the robot interface to turn the head.

binds to ConfusionKS.ConfusedLocations
reads data category confusionHypotheses, headOrientation
writes data category headOrientation

Identification knowledge sources

This section focuses on implementation of sound identification knowledge sources within the blackboard framework.

Identity knowledge source: IdentityKS

Objects of class IdentityKS implement source type models, by incorporating an instance of a model (which has to implement the models.Base interface) with knowledge about the relationship of auditory cues and certain sound source types. Many Identity knowledge sources can be used concurrently; usually, for each sound class to be identified, you would instantiate an object of class IdentityKS with the respective model. The models get loaded from directories you specify upon construction, and should be created with the sec-idTrainPipeline. The model object of IdentityKS can employ any kind of model, such as a linear support vector machine, or a Gaussian mixture model. The IdentityKS needs access to signals, thus it is a subclass of AuditoryFrontEndDepKS (see Section sec-afe-dep-knowledge-source). The model object holds the signal request structure.

The knowledge source predicts, based on the incorporated source model, whether the currently received auditory stream includes an auditory object of the sound type it represents.

binds to AuditoryFrontEndKS.KsFiredEvent
writes data category identityHypotheses
triggers event KsFiredEvent

Have a look at the example sec-examples-identification to see IdentityKS in action.

Identity decision knowledge source: IdDecisionKS

The identity knowledge source checks new identity hypotheses. It then decides which of them are valid, by comparison and incorporating knowledge about the number of assumed auditory objects in the scene.

binds to IdentityKS.KsFiredEvent
reads data category identityHypotheses
writes data category identityDecision
triggers event KsFiredEvent

Identity Live Debugging knowledge source: IdTruthPlotKS

This is not really a knowledge source in the sense of the word, but rather a way to enable live-inspection of the identity information in the blackboard system. Upon construction, it takes ground truth information about event labels, onset and offset times; and when triggered, displays this in comparison with the actual hypotheses created by the identity knowledge sources. This figure <fig-identify> shows an example of the produced plot.

binds to IdentityKS.KsFiredEvent
reads data category identityHypotheses

Sound quality related knowledge sources

Coloration knowledge source: ColorationKS

The class ColorationKS implements the prediction of the perceived change in timbre of an auditory event compared to a reference. The reference is nothing fixed, but can be learned and is stored inside the blackboard memory. At the moment the learning is implemented in a very low level fashion: the first signal the is confronted with is learned as the reference and all later signals are compared to that reference. The colorationHypotheses is then a value between 0 and 1 -- whereas it is not hard limited and can be larger than 1 for some conditions. The actual value is calculated using the naturalness model from [MooreTan2004] which compares the weighted excitation patterns of the reference and the test stimulus.

binds to AuditoryFrontEndKS.KsFiredEvent
reads data category colorationReference
writes data category colorationHypotheses or colorationReference
triggers event KsFiredEvent

Location knowledge source: ItdLocationKS

As the current implementation of GmtkLocationKS and DnnLocationKS are not able to predict the localisation reliable under difficult conditions, we introduced a different location knowledge source as an intermediate solution. This ItdLocationKS is optimised for the prediction of predicting the perceived direction of a sound source created by a spatial audio system. It uses only cues under 1400 Hz and utilises a lookup table to match those values to the corresponding angles. This implies that the knowledge source is not able to distinguish between front and back. Beside the lookup table it uses also an outlier detection in the process of integrating the perceived angles over the different frequency channels as suggested in [Wierstorf2014]. The output of the knowledge source is a sourcesAzimuthsDistributionHypotheses identical to the output of the GmtkLocationKS or DnnLocationKS.

binds to AuditoryFrontEndKS.KsFiredEvent
writes data category sourcesAzimuthsDistributionHypotheses
triggers event KsFiredEvent

Segmentation knowledge sources

This section focuses on implementation of knowledge sources for the segmentation of auditory features within the blackboard framework.

Segmentation knowledge source: SegmentationKS

The segmentation knowledge source generates hypotheses about the assignment of individual time-frequency units to sound sources present in a scene. This assignment is done probabilistically, hence, each time-frequency unit is associated with a unique discrete probability distribution. These distributions can be interpreted as soft-masks which can be used to generate segmented auditory features. Specifically, each auditory feature that can be represented in the time-frequency domain can be modified accordingly by a corresponding soft-mask. The soft-masks are generated by a probabilistic clustering approach based on a mixture of von Mises distributions over estimated angular positions of the sound sources. These positions can either be estimated by the SegmentationKS itself or provided by a SourcesAzimuthsDistributionHypothesis on the blackboard. If not all source positions can be reliably estimated by the DnnLocationKS, the remaining positions are estimated during the segmentation process. All estimated positions are subsumed with corresponding circular uncertainties in a sourceAzimuthHypotheses object for each sound source. Additionally, the estimated soft-masks are stored in a sound source specific segmentationHypotheses object. Each sourceAzimuthHypotheses and segmentationHypotheses contains a unique source identifier tag, enabling other knowledge sources to assign each soft-mask with the corresponding source position. The current implementation of the SegmentationKS relies on a pre-defined number of sound sources that will be present in the scene.

binds to AuditoryFrontEndKS.KsFiredEvent
reads data category sourcesAzimuthsDistributionHypotheses
writes data category sourceAzimuthHypotheses and segmentationHypotheses
triggers event KsFiredEvent

Obsolete knowledge sources

Acoustic cues knowledge source: AcousticCuesKS

This knowledge source is obsolete and will be removed in a later release.

Upcoming knowledge sources

For the following knowledge sources skeleton files are already existing, but its functionality is not implemented yet.

Number of sources knowledge source: SourceNumberKS

This knowledge source will generate a hypothesis about the number of sound sources present in the auditory scene.

MaEtAl2015dnn

Ma, N., Brown, G. J. and May, T. (2015) Robust localisation of of multiple speakers exploiting deep neural networks and head movements. Proceedings of Interspeech'15, pp.3302-3306, Dresden, Germany

MooreTan2004

Moore, B. C. J., & Tan, C. (2004) Development and Validation of a Method for Predicting the Perceived Naturalness of Sounds Subjected to Spectral Distortion. JAES, 52(9), 900–14.

Wierstorf2014

Wierstorf, H. (2014) “Perceptual Assessment of Sound Field Synthesis,” PhD-thesis, TU Berlin