New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing data or a Processor into RNNDownBeatProcessor instead of a filename #357
Comments
You have a couple of options to accomplish that:
HTH |
Like this? |
Yes, but the above only works if the original sample rate is 44.1kHz. Sorry for the misleading first answer. To work with any sample rate, do the following: audio, _ = load_audio_file(audio_file, 44100) # resamples the signal to 44.1kHz
audio = trim(audio)
act = RNNBeatProcessor()(audio) If the sample rate is already 44.1kHz, you can omit the first line. There should be no need to Some background information: |
Great! Thanks for that. I'm noticing an offset of the returned beats by around 40ms before it should be. The trimming was to see if the silence at the start was throwing it off. Know where that offset might be coming from? If you think it'll be consistently 40ms i can manually adjust it. |
Could you please check if this a constant offset (i.e. throughout the piece) or if it affects only a couple of beats (e.g. the first ones). If it is the former case, I'd be very interested in checking it. If it is the latter case, I fear that there's not much you can do about, since this is just what the network predicts. I doubt that it has anything to do with leading or trailing silence in the files, but I cannot rule it out. A question though: what are you doing with these raw activations? Usually, the final decision about the beats is done in a second step, e.g. by |
I cannot run RNNDownBeatProcessor on an audio read by madmom. The error is as follows: melody,sr = madmom.audio.signal.load_audio_file(file_name, sample_rate=44100,dtype ='float') print(melody.shape) RNNDownBeatProcessor()(melody) ~/Downloads/madmom/madmom/audio/stft.py in process(self, data, **kwargs) ~/Downloads/madmom/madmom/audio/stft.py in new(cls, frames, window, fft_size, circular_shift, include_nyquist, fft_window, **kwargs) ~/Downloads/madmom/madmom/audio/stft.py in stft(frames, window, fft_size, circular_shift, include_nyquist) ValueError: frames must be a 2D array or iterable, got <class 'madmom.audio.signal.FramedSignal'> with shape (20846, 1024, 2). |
|
I want to combine several tracks together, and makes that mix as input to RNNDownBeatProcessor. So I have to write the mix to file first, then read that file, right? Also I want to confirm if I should this madmom.audio.signal.rescale function to make sure the audio amplitude is not too large? |
No, there's no need to save as file first, you only have to downmix it to mono. I included the information about the filename since your example showed that you are reading a file. You did not state how you combined the tracks, so I can only guess. If you used an external tool, reading from file is the easiest. If you did it in Python and the signal is available as an ndarray, you can use the
|
Sorry for missing that info. Below is what I do now: melody,sr = madmom.audio.signal.load_audio_file(os.path.join(dir_name, melody_name), sample_rate=44100,dtype = 'float') bass,_ = madmom.audio.signal.load_audio_file(os.path.join(dir_name, bass_name), sample_rate=44100,dtype = 'float') mix =madmom.audio.signal.rescale(bass+melody) |
mono = madmom.audio.signal.remix(mix, 1)
RNNDownBeatProcessor()(mono, sample_rate = sr) should do the trick. |
I merged #368, you can try with the current master if it works like you tried in your last comment (not entirely sure if it does). |
I add load_audio_file parameter num_channels=1, that solves the problem! |
Of course, this solves it as well. |
Hey!
I need to trim the audio before passing it into the RNNDownBeatProcessor, but I can't figure out how pass data instead of a filename. Any ideas?
Thanks
The text was updated successfully, but these errors were encountered: