Allow selection of channel when loading / creating audio as mono #409

declension · 2019-01-16T11:28:41Z

Changes proposed in this pull request

Allow remix to choose a channel, and just copy results from that channel if required
ffmpeg version - use pan audio filter (see section in manual).
Propagate this parameter through all the audio loading layers
Add tests around all aspects of this

This fixes #211.

* Allow `remix` to choose a channel, and just copy results from that channel if required * `ffmpeg` version - use `pan` audio filter (see [section in manual](https://trac.ffmpeg.org/wiki/AudioChannelManipulation)). * Propagate this parameter through all the audio loading layers * Add tests around all aspects of this This fixes CPJKU#211.

superbock

Nice work, very useful addition!

Besides the specific details, it would be nice if you could reformat the calling arguments to fill all lines up to 79 characters and then add a line break (I have not checked if some arguments would fit the line before, I just looks like this might be the case).

madmom/audio/signal.py

superbock · 2019-01-17T09:16:19Z

madmom/audio/signal.py

@@ -510,6 +516,9 @@ class Signal(np.ndarray):
    num_channels : int, optional
        Reduce or expand the signal to `num_channels` channels, or 'None'
        to return the signal with its original channels.
+    channel : int, optional
+        When reducing a signal to `num_channels` of 1, use this channel,
+        or 'None' to return the average across all channels.


This is the nicest and most descriptive docstring of channel in the whole module. Would be nice if you replace the others with this one (and adapt if needed).

Yes I liked this one. IIRC I used a this different description because a lot of them just pass through, so don't really do the logic themselves, but I can propagate this through everywhere

madmom/io/audio.py

declension · 2019-01-17T09:35:38Z

Nice work, very useful addition!

Thanks!

Besides the specific details, it would be nice if you could reformat the calling arguments to fill all lines up to 79 characters and then add a line break (I have not checked if some arguments would fit the line before, I just looks like this might be the case).

Yes, I tried both. Personally I have a slight preference for logical grouping (number of channels and channel) when there are lots of arguments, but I'll change this back where it fits

superbock · 2019-01-17T09:37:16Z

It would be nice if you could add the same channel selection mechanism to audio.signal.Stream as well. The logic there seems to be a bit more complicated however, since it is not possible to select the channel directly. One has to request the number of channels and then only use the correct one.

I have this code lying around somewhere, not sure how much changes happened to that class in the meantime...

class Stream(object):
   """
   A Stream handles live (i.e. online, real-time) audio input via PyAudio.

   Parameters
   ----------
   sample_rate : int
       Sample rate of the signal.
   num_channels : int, optional
       Number of channels. Per default, only the first input channel is used.
       Additionally, the following literals are supported:

       - 'left' : use only the left (i.e. first) channel
       - 'right' : use only the right (i.e. second) channel
       - 'mono' : down-mix all input channels to mono

   dtype : numpy dtype, optional
       Data type for the signal.
   frame_size : int, optional
       Size of one frame [samples].
   hop_size : int, optional
       Progress `hop_size` samples between adjacent frames.
   fps : float, optional
       Use given frames per second; if set, this computes and overwrites the
       given `hop_size` value (the resulting `hop_size` must be an integer).
   queue_size : int
       Size of the FIFO (first in first out) queue. If the queue is full and
       new audio samples arrive, the oldest item in the queue will be dropped.

   Notes
   -----
   Stream is implemented as an iterable which operates on a blocking queue.
   The queue is filled by PyAudio and iterating over the stream returns the
   live audio signal frame by frame. It blocks when no new data is available.

   """
   QUEUE_SIZE = 1

   def __init__(self, sample_rate=SAMPLE_RATE, num_channels=NUM_CHANNELS,
                dtype=np.float32, frame_size=FRAME_SIZE, hop_size=HOP_SIZE,
                fps=FPS, queue_size=QUEUE_SIZE, **kwargs):
       # import PyAudio here and not at the module level
       import pyaudio
       try:
           from Queue import Queue  # Python 2
       except ImportError:
           from queue import Queue  # Python 3

       # set attributes
       self.sample_rate = sample_rate
       # channel indicates which channel to select if there are more than one
       self.channel = None
       if num_channels in (None, 1):
           self.num_channels = 1
       elif num_channels == 'left':
           self.num_channels = 2
           # get the first channel
           self.channel = 0
       elif num_channels == 'right':
           self.num_channels = 2
           # get the second channel
           self.channel = 1
       # wanted dtype (PyAudio is set to float32 always)
       self.dtype = dtype
       if frame_size:
           self.frame_size = int(frame_size)
       if fps:
           # use fps instead of hop_size
           hop_size = self.sample_rate / float(fps)
       if int(hop_size) != hop_size:
           raise ValueError(
               'only integer `hop_size` supported, not %s' % hop_size)
       self.hop_size = int(hop_size)
       # set up a queue
       self.queue_size = queue_size
       self.queue = Queue(maxsize=self.queue_size)
       # init PyAudio for recording
       self.pa = pyaudio.PyAudio()
       self.stream = self.pa.open(rate=self.sample_rate,
                                  channels=self.num_channels,
                                  format=pyaudio.paFloat32, input=True,
                                  frames_per_buffer=self.hop_size,
                                  stream_callback=self.callback, start=True)

       # create a buffer
       self.buffer = BufferProcessor(self.frame_size)
       # frame index counter
       self.frame_idx = 0
       # PyAudio flags
       self.paComplete = pyaudio.paComplete
       self.paContinue = pyaudio.paContinue

   def __iter__(self):
       return self

   def __next__(self):
       return self.queue.get()

   next = __next__

   def callback(self, data, *args):
       """
       Callback function called by PyAudio every time a new audio chunk is
       ready.

       Parameters
       ----------
       data : str
           Audio data represented as a string.

       Returns
       -------
       data : numpy array
           Audio data as numpy array.
       flag : PortAudio callback flag
           Indicate whether to continue the stream.

       """
       # get the data from the stream
       data = np.fromstring(data, 'float32').astype(self.dtype, copy=False)
       num_channels = 1
       # multiple channels are interleaved, reshape
       if self.num_channels > 1:
           np.reshape(data, (-1, self.num_channels))
           num_channels = self.num_channels
           # extract the wanted channel
           if self.channel is not None:
               data = data[:, self.channel]
               num_channels = 1
       # buffer the data
       data = self.buffer(data)
       # wrap it as a Signal (including the start position)
       # TODO: check float/int hop size
       start = (self.frame_idx * float(self.hop_size) / self.sample_rate)
       signal = Signal(data, sample_rate=self.sample_rate, dtype=self.dtype,
                       num_channels=num_channels, start=start)
       # if the queue if full erase the first item from the queue
       if self.queue.qsize() >= self.queue_size:
           # TODO: how to warn if we missed a frame? Raise an error,
           #       or is a simple warning enough? Maybe a counter...
           import warnings
           warnings.warn('dropping frame...')
           self.queue.get()
       # put the signal into the queue
       self.queue.put(signal, block=False)
       # increment the frame index
       self.frame_idx += 1
       # return data and indicate that the PyAudio stream should continue
       # TODO: returning the data is pointless, since this callback is only
       #       used for recording, not for playback
       return data, self.paContinue

   def is_running(self):
       return self.stream.is_active()

   def close(self):
       self.stream.close()
       # TODO: is this the correct place to terminate PyAudio?
       self.pa.terminate()

I can also add this if you prefer me to do that.

superbock · 2019-01-17T09:39:16Z

Yes, I tried both. Personally I have a slight preference for logical grouping (number of channels and channel) when there are lots of arguments, but I'll change this back where it fits

I totally agree, but I'd prefer to have it consistent throughout the codebase.

superbock · 2019-01-17T09:53:23Z

Your comment above made me rethink the whole channel selection mechanism. How about deprecating num_channels (or at least discourage its usage) and go for channels instead? Channels can be one of the following:

a single int, indicating which channel to use (basically your changes)
a literal {mono, left, right}
a list of int, indicating which channels to use

Whenever a single channel is selected, num_channels=1 would be set automatically.

The last option is not of great usage as of now, since STFT can only operate on mono signals (see #371), but it would be nice to be able to have it this way in the future.

declension · 2019-01-17T09:58:42Z

It would be nice if you could add the same channel selection mechanism to audio.signal.Stream as well. The logic there seems to be a bit more complicated however, since it is not possible to select the channel directly. One has to request the number of channels and then only use the correct one.

I have this code lying around somewhere, not sure how much changes happened to that class in the meantime...

Interesting, I've not used Stream - perhaps that could be a separate PR...

declension · 2019-01-17T10:12:23Z

Your comment above made me rethink the whole channel selection mechanism. How about deprecating num_channels (or at least discourage its usage) and go for channels instead? Channels can be one of the following:
* a single `int`, indicating which channel to use (basically your changes)

* a literal `{mono, left, right}`

AIUI these are slightly convention-driven

* a list of int, indicating which channels to use
Whenever a single channel is selected, num_channels=1 would be set automatically.

Actually I thought about something like this, but started to scope creep way out of the original #211.

But whilst the discussion is going I guess, some thoughts I'd had 😄 ...

If you have multiple channels to select, you eventually need to start mapping source to destination (e.g. downmixing 5.1, ffmpeg itself can do some crazy stuff here) but it's ugly and perhaps other libraries do it better.

But if we concentrated on the simple cases, yes...

Explicit channels

channels = [0,1]
Picks the first two channels only. Equivalent in stereo to num_channels=None

Mono downmix

channels = [0,1], num_channels=1
(or even upmixing from mono at some point)
equivalent to (for stereo):
num_channels=1

The last option is not of great usage as of now, since STFT can only operate on mono signals (see #371), but it would be nice to be able to have it this way in the future.

Hmm yes

superbock · 2019-01-17T10:22:15Z

Ok, I'd vote to keep it simple and stupid for now. No fancy channel selection mechanisms, no multi-channel support or whatsoever. Literals can be added in a separate PR, as can be fancy downmixing options (i.e. which channels to select for downmixing). But with the design as it is now, all of these options are possible in future without breaking anything.

* Update docstrings to indicate behaviour when omitting `num_channels` * Make ffmpeg versions support this same behaviour

declension · 2019-01-17T14:34:56Z

@superbock Think that's everything...

superbock · 2019-01-17T15:07:58Z

Very nice work, thanks again!

superbock requested changes Jan 17, 2019

View reviewed changes

Revert accidental change

0bbefe8

Code style / comment fixes in line with PR review

ea3d0f9

Clarify num_channels and channel interaction for audio loading

d526395

* Update docstrings to indicate behaviour when omitting `num_channels` * Make ffmpeg versions support this same behaviour

superbock merged commit 41155f4 into CPJKU:master Jan 17, 2019

declension deleted the feature/signal-channel-selection branch January 17, 2019 15:08

superbock mentioned this pull request Jan 17, 2019

add channel selection mechanism to Stream #410

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow selection of channel when loading / creating audio as mono #409

Allow selection of channel when loading / creating audio as mono #409

declension commented Jan 16, 2019

superbock left a comment

superbock Jan 17, 2019

declension Jan 17, 2019

declension commented Jan 17, 2019

superbock commented Jan 17, 2019 •

edited

superbock commented Jan 17, 2019

superbock commented Jan 17, 2019

declension commented Jan 17, 2019

declension commented Jan 17, 2019

superbock commented Jan 17, 2019

declension commented Jan 17, 2019

superbock commented Jan 17, 2019

Allow selection of channel when loading / creating audio as mono #409

Allow selection of channel when loading / creating audio as mono #409

Conversation

declension commented Jan 16, 2019

Changes proposed in this pull request

superbock left a comment

Choose a reason for hiding this comment

superbock Jan 17, 2019

Choose a reason for hiding this comment

declension Jan 17, 2019

Choose a reason for hiding this comment

declension commented Jan 17, 2019

superbock commented Jan 17, 2019 • edited

superbock commented Jan 17, 2019

superbock commented Jan 17, 2019

declension commented Jan 17, 2019

declension commented Jan 17, 2019

Explicit channels

Mono downmix

superbock commented Jan 17, 2019

declension commented Jan 17, 2019

superbock commented Jan 17, 2019

superbock commented Jan 17, 2019 •

edited