Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow selection of channel when loading / creating audio as mono #409

Merged
merged 4 commits into from Jan 17, 2019

Conversation

declension
Copy link
Contributor

Changes proposed in this pull request

  • Allow remix to choose a channel, and just copy results from that channel if required
  • ffmpeg version - use pan audio filter (see section in manual).
  • Propagate this parameter through all the audio loading layers
  • Add tests around all aspects of this

This fixes #211.

 * Allow `remix` to choose a channel, and just copy results from that channel if required
 * `ffmpeg` version - use `pan` audio filter (see [section in manual](https://trac.ffmpeg.org/wiki/AudioChannelManipulation)).
 * Propagate this parameter through all the audio loading layers
 * Add tests around all aspects of this

This fixes CPJKU#211.
Copy link
Collaborator

@superbock superbock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, very useful addition!

Besides the specific details, it would be nice if you could reformat the calling arguments to fill all lines up to 79 characters and then add a line break (I have not checked if some arguments would fit the line before, I just looks like this might be the case).

madmom/audio/signal.py Outdated Show resolved Hide resolved
madmom/audio/signal.py Outdated Show resolved Hide resolved
madmom/audio/signal.py Outdated Show resolved Hide resolved
@@ -510,6 +516,9 @@ class Signal(np.ndarray):
num_channels : int, optional
Reduce or expand the signal to `num_channels` channels, or 'None'
to return the signal with its original channels.
channel : int, optional
When reducing a signal to `num_channels` of 1, use this channel,
or 'None' to return the average across all channels.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the nicest and most descriptive docstring of channel in the whole module. Would be nice if you replace the others with this one (and adapt if needed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I liked this one. IIRC I used a this different description because a lot of them just pass through, so don't really do the logic themselves, but I can propagate this through everywhere

madmom/io/audio.py Show resolved Hide resolved
@declension
Copy link
Contributor Author

Nice work, very useful addition!

Thanks!

Besides the specific details, it would be nice if you could reformat the calling arguments to fill all lines up to 79 characters and then add a line break (I have not checked if some arguments would fit the line before, I just looks like this might be the case).

Yes, I tried both. Personally I have a slight preference for logical grouping (number of channels and channel) when there are lots of arguments, but I'll change this back where it fits

@superbock
Copy link
Collaborator

superbock commented Jan 17, 2019

It would be nice if you could add the same channel selection mechanism to audio.signal.Stream as well. The logic there seems to be a bit more complicated however, since it is not possible to select the channel directly. One has to request the number of channels and then only use the correct one.

I have this code lying around somewhere, not sure how much changes happened to that class in the meantime...

class Stream(object):
   """
   A Stream handles live (i.e. online, real-time) audio input via PyAudio.

   Parameters
   ----------
   sample_rate : int
       Sample rate of the signal.
   num_channels : int, optional
       Number of channels. Per default, only the first input channel is used.
       Additionally, the following literals are supported:

       - 'left' : use only the left (i.e. first) channel
       - 'right' : use only the right (i.e. second) channel
       - 'mono' : down-mix all input channels to mono

   dtype : numpy dtype, optional
       Data type for the signal.
   frame_size : int, optional
       Size of one frame [samples].
   hop_size : int, optional
       Progress `hop_size` samples between adjacent frames.
   fps : float, optional
       Use given frames per second; if set, this computes and overwrites the
       given `hop_size` value (the resulting `hop_size` must be an integer).
   queue_size : int
       Size of the FIFO (first in first out) queue. If the queue is full and
       new audio samples arrive, the oldest item in the queue will be dropped.

   Notes
   -----
   Stream is implemented as an iterable which operates on a blocking queue.
   The queue is filled by PyAudio and iterating over the stream returns the
   live audio signal frame by frame. It blocks when no new data is available.

   """
   QUEUE_SIZE = 1

   def __init__(self, sample_rate=SAMPLE_RATE, num_channels=NUM_CHANNELS,
                dtype=np.float32, frame_size=FRAME_SIZE, hop_size=HOP_SIZE,
                fps=FPS, queue_size=QUEUE_SIZE, **kwargs):
       # import PyAudio here and not at the module level
       import pyaudio
       try:
           from Queue import Queue  # Python 2
       except ImportError:
           from queue import Queue  # Python 3

       # set attributes
       self.sample_rate = sample_rate
       # channel indicates which channel to select if there are more than one
       self.channel = None
       if num_channels in (None, 1):
           self.num_channels = 1
       elif num_channels == 'left':
           self.num_channels = 2
           # get the first channel
           self.channel = 0
       elif num_channels == 'right':
           self.num_channels = 2
           # get the second channel
           self.channel = 1
       # wanted dtype (PyAudio is set to float32 always)
       self.dtype = dtype
       if frame_size:
           self.frame_size = int(frame_size)
       if fps:
           # use fps instead of hop_size
           hop_size = self.sample_rate / float(fps)
       if int(hop_size) != hop_size:
           raise ValueError(
               'only integer `hop_size` supported, not %s' % hop_size)
       self.hop_size = int(hop_size)
       # set up a queue
       self.queue_size = queue_size
       self.queue = Queue(maxsize=self.queue_size)
       # init PyAudio for recording
       self.pa = pyaudio.PyAudio()
       self.stream = self.pa.open(rate=self.sample_rate,
                                  channels=self.num_channels,
                                  format=pyaudio.paFloat32, input=True,
                                  frames_per_buffer=self.hop_size,
                                  stream_callback=self.callback, start=True)

       # create a buffer
       self.buffer = BufferProcessor(self.frame_size)
       # frame index counter
       self.frame_idx = 0
       # PyAudio flags
       self.paComplete = pyaudio.paComplete
       self.paContinue = pyaudio.paContinue

   def __iter__(self):
       return self

   def __next__(self):
       return self.queue.get()

   next = __next__

   def callback(self, data, *args):
       """
       Callback function called by PyAudio every time a new audio chunk is
       ready.

       Parameters
       ----------
       data : str
           Audio data represented as a string.

       Returns
       -------
       data : numpy array
           Audio data as numpy array.
       flag : PortAudio callback flag
           Indicate whether to continue the stream.

       """
       # get the data from the stream
       data = np.fromstring(data, 'float32').astype(self.dtype, copy=False)
       num_channels = 1
       # multiple channels are interleaved, reshape
       if self.num_channels > 1:
           np.reshape(data, (-1, self.num_channels))
           num_channels = self.num_channels
           # extract the wanted channel
           if self.channel is not None:
               data = data[:, self.channel]
               num_channels = 1
       # buffer the data
       data = self.buffer(data)
       # wrap it as a Signal (including the start position)
       # TODO: check float/int hop size
       start = (self.frame_idx * float(self.hop_size) / self.sample_rate)
       signal = Signal(data, sample_rate=self.sample_rate, dtype=self.dtype,
                       num_channels=num_channels, start=start)
       # if the queue if full erase the first item from the queue
       if self.queue.qsize() >= self.queue_size:
           # TODO: how to warn if we missed a frame? Raise an error,
           #       or is a simple warning enough? Maybe a counter...
           import warnings
           warnings.warn('dropping frame...')
           self.queue.get()
       # put the signal into the queue
       self.queue.put(signal, block=False)
       # increment the frame index
       self.frame_idx += 1
       # return data and indicate that the PyAudio stream should continue
       # TODO: returning the data is pointless, since this callback is only
       #       used for recording, not for playback
       return data, self.paContinue

   def is_running(self):
       return self.stream.is_active()

   def close(self):
       self.stream.close()
       # TODO: is this the correct place to terminate PyAudio?
       self.pa.terminate()

I can also add this if you prefer me to do that.

@superbock
Copy link
Collaborator

Yes, I tried both. Personally I have a slight preference for logical grouping (number of channels and channel) when there are lots of arguments, but I'll change this back where it fits

I totally agree, but I'd prefer to have it consistent throughout the codebase.

@superbock
Copy link
Collaborator

Your comment above made me rethink the whole channel selection mechanism. How about deprecating num_channels (or at least discourage its usage) and go for channels instead? Channels can be one of the following:

  • a single int, indicating which channel to use (basically your changes)
  • a literal {mono, left, right}
  • a list of int, indicating which channels to use

Whenever a single channel is selected, num_channels=1 would be set automatically.

The last option is not of great usage as of now, since STFT can only operate on mono signals (see #371), but it would be nice to be able to have it this way in the future.

@declension
Copy link
Contributor Author

It would be nice if you could add the same channel selection mechanism to audio.signal.Stream as well. The logic there seems to be a bit more complicated however, since it is not possible to select the channel directly. One has to request the number of channels and then only use the correct one.

I have this code lying around somewhere, not sure how much changes happened to that class in the meantime...

Interesting, I've not used Stream - perhaps that could be a separate PR...

@declension
Copy link
Contributor Author

Your comment above made me rethink the whole channel selection mechanism. How about deprecating num_channels (or at least discourage its usage) and go for channels instead? Channels can be one of the following:

* a single `int`, indicating which channel to use (basically your changes)

* a literal `{mono, left, right}`

AIUI these are slightly convention-driven

* a list of int, indicating which channels to use

Whenever a single channel is selected, num_channels=1 would be set automatically.

Actually I thought about something like this, but started to scope creep way out of the original #211.

But whilst the discussion is going I guess, some thoughts I'd had 😄 ...

If you have multiple channels to select, you eventually need to start mapping source to destination (e.g. downmixing 5.1, ffmpeg itself can do some crazy stuff here) but it's ugly and perhaps other libraries do it better.

But if we concentrated on the simple cases, yes...

Explicit channels

channels = [0,1]
Picks the first two channels only. Equivalent in stereo to num_channels=None

Mono downmix

channels = [0,1], num_channels=1
(or even upmixing from mono at some point)
equivalent to (for stereo):
num_channels=1

The last option is not of great usage as of now, since STFT can only operate on mono signals (see #371), but it would be nice to be able to have it this way in the future.

Hmm yes

@superbock
Copy link
Collaborator

Ok, I'd vote to keep it simple and stupid for now. No fancy channel selection mechanisms, no multi-channel support or whatsoever. Literals can be added in a separate PR, as can be fancy downmixing options (i.e. which channels to select for downmixing). But with the design as it is now, all of these options are possible in future without breaking anything.

 * Update docstrings to indicate behaviour when omitting `num_channels`
 * Make ffmpeg versions support this same behaviour
@declension
Copy link
Contributor Author

@superbock Think that's everything...

@superbock superbock merged commit 41155f4 into CPJKU:master Jan 17, 2019
@superbock
Copy link
Collaborator

Very nice work, thanks again!

@declension declension deleted the feature/signal-channel-selection branch January 17, 2019 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add channel selection mechanism to Signal
2 participants