This section focuses on implementation of knowledge sources for the segregation of auditory streams and mask auditory features to isolate each stream within the blackboard framework.
The stream segregation knowledge source generates hypotheses about the
assignment of individual time-frequency units to sound sources present in a
scene. This assignment is done probabilistically, hence, each time-frequency
unit is associated with a unique discrete probability distribution. These
distributions can be interpreted as soft-masks which can be used to generate
segregated auditory features. Specifically, each auditory feature that can be
represented in the time-frequency domain can be modified accordingly by a
corresponding soft-mask. The soft-masks are generated by a probabilistic
clustering approach based on a mixture of von Mises distributions over estimated
angular positions of the sound sources. Estimation for these positions are
provided by a locationHypothesis
or if unavailable then
sourcesAzimuthsDistributionHypotheses
on the blackboard. Positions can be
reliably estimated through the combination of DnnLocationKS
and
LocalisationDecisionKS
. Additionally, the estimated soft-masks are stored
in a sound source specific segmentationHypothesis
object. Each
segmentationHypotheses
contains a unique source identifier tag, enabling
other knowledge sources to assign each soft-mask with the corresponding source
position. The current implementation of the StreamSegregationKS
relies on a
pre-defined number of sound sources that will be present in the scene. The
number of sound sources is provided through the NumberOfSourcesHypotheses
.
binds to | AuditoryFrontEndKS.KsFiredEvent |
reads data category | locationHypothesis (otherwise sourcesAzimuthsDistributionHypotheses )
and NumberOfSourcesHypotheses |
writes data category | segmentationHypotheses |
triggers event | KsFiredEvent |