Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing data or a Processor into RNNDownBeatProcessor instead of a filename #357

Closed
robclouth opened this issue Apr 4, 2018 · 14 comments
Closed

Comments

@robclouth
Copy link

Hey!
I need to trim the audio before passing it into the RNNDownBeatProcessor, but I can't figure out how pass data instead of a filename. Any ideas?

Thanks

@superbock
Copy link
Collaborator

You have a couple of options to accomplish that:

  • trim the audio beforehand and pass it together with the correct sample_rate through the processor
  • simply add start and stop positions in seconds as processing arguments (when calling the processor)

HTH

@robclouth
Copy link
Author

audio, sr = load_audio_file(audio_file)
audio = trim(audio)
act = RNNBeatProcessor()(audio, sample_rate=sr)

Like this?

@superbock
Copy link
Collaborator

Yes, but the above only works if the original sample rate is 44.1kHz. Sorry for the misleading first answer. To work with any sample rate, do the following:

audio, _ = load_audio_file(audio_file, 44100)  # resamples the signal to 44.1kHz
audio = trim(audio)
act = RNNBeatProcessor()(audio)

If the sample rate is already 44.1kHz, you can omit the first line.

There should be no need to trim the signal, though. I assumed that you wanted 'trim' the signal as in skipping the first N seconds or extracting snippets of a certain length, then act = RNNBeatProcessor()(audio_file, start=1, stop=10) would have been the easiest.

Some background information: RNNBeatProcessor requires 44.1kHz sampled audio in order to apply the same audio pre-processing (i.e. filtering) as when the neural network was trained. Thus, if the sample rate differs from the default 44.1kHz it needs to be resampled to exactly this rate.

@robclouth
Copy link
Author

Great! Thanks for that. I'm noticing an offset of the returned beats by around 40ms before it should be. The trimming was to see if the silence at the start was throwing it off. Know where that offset might be coming from? If you think it'll be consistently 40ms i can manually adjust it.

@superbock
Copy link
Collaborator

Could you please check if this a constant offset (i.e. throughout the piece) or if it affects only a couple of beats (e.g. the first ones). If it is the former case, I'd be very interested in checking it. If it is the latter case, I fear that there's not much you can do about, since this is just what the network predicts.

I doubt that it has anything to do with leading or trailing silence in the files, but I cannot rule it out.

A question though: what are you doing with these raw activations? Usually, the final decision about the beats is done in a second step, e.g. by DBNBeatTrackingProcessor.

@bzvew
Copy link

bzvew commented May 9, 2018

I cannot run RNNDownBeatProcessor on an audio read by madmom. The error is as follows:

melody,sr = madmom.audio.signal.load_audio_file(file_name, sample_rate=44100,dtype ='float')

print(melody.shape)
(9192960, 2)

RNNDownBeatProcessor()(melody)

~/Downloads/madmom/madmom/audio/stft.py in process(self, data, **kwargs)
475 circular_shift=self.circular_shift,
476 include_nyquist=self.include_nyquist,
--> 477 fft_window=self.fft_window, **kwargs)
478 # cache the window used for FFT
479 # Note: depending on the signal this may be scaled already

~/Downloads/madmom/madmom/audio/stft.py in new(cls, frames, window, fft_size, circular_shift, include_nyquist, fft_window, **kwargs)
334 data = stft(frames, fft_window, fft_size=fft_size,
335 circular_shift=circular_shift,
--> 336 include_nyquist=include_nyquist)
337
338 # cast as ShortTimeFourierTransform

~/Downloads/madmom/madmom/audio/stft.py in stft(frames, window, fft_size, circular_shift, include_nyquist)
73 # TODO: add multi-channel support
74 raise ValueError('frames must be a 2D array or iterable, got %s with '
---> 75 'shape %s.' % (type(frames), frames.shape))
76
77 # shape of the frames

ValueError: frames must be a 2D array or iterable, got <class 'madmom.audio.signal.FramedSignal'> with shape (20846, 1024, 2).

@superbock
Copy link
Collaborator

RNNDownBeatProcessor expects a mono Signal — but there's no need to load the audio manually beforehand, simply pass file_name.

@bzvew
Copy link

bzvew commented May 9, 2018

I want to combine several tracks together, and makes that mix as input to RNNDownBeatProcessor. So I have to write the mix to file first, then read that file, right? Also I want to confirm if I should this madmom.audio.signal.rescale function to make sure the audio amplitude is not too large?

@superbock
Copy link
Collaborator

No, there's no need to save as file first, you only have to downmix it to mono. I included the information about the filename since your example showed that you are reading a file.

You did not state how you combined the tracks, so I can only guess. If you used an external tool, reading from file is the easiest. If you did it in Python and the signal is available as an ndarray, you can use the remix function to downmix it and then instantiate a Signal. I hoped that for the latter case it also worked by passing num_channels to Signal, but this does currently not work, see #367. I'll submit a fix soon, then you can do:

s = Signal(audio_array, sample_rate, num_channels=1)

@bzvew
Copy link

bzvew commented May 9, 2018

Sorry for missing that info. Below is what I do now:

melody,sr = madmom.audio.signal.load_audio_file(os.path.join(dir_name, melody_name), sample_rate=44100,dtype = 'float')

bass,_ = madmom.audio.signal.load_audio_file(os.path.join(dir_name, bass_name), sample_rate=44100,dtype = 'float')

mix =madmom.audio.signal.rescale(bass+melody)
RNNDownBeatProcessor()(mix)

@superbock
Copy link
Collaborator

mono = madmom.audio.signal.remix(mix, 1)
RNNDownBeatProcessor()(mono, sample_rate = sr)

should do the trick.

@superbock
Copy link
Collaborator

I merged #368, you can try with the current master if it works like you tried in your last comment (not entirely sure if it does).

@bzvew
Copy link

bzvew commented May 9, 2018

I add load_audio_file parameter num_channels=1, that solves the problem!

@superbock
Copy link
Collaborator

Of course, this solves it as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants