Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning offsets with bars #5

Open
xstasi opened this issue Mar 18, 2023 · 2 comments
Open

Aligning offsets with bars #5

xstasi opened this issue Mar 18, 2023 · 2 comments

Comments

@xstasi
Copy link

xstasi commented Mar 18, 2023

My question here only applies to songs where both the tempo and the time signature are known, but that should be most of the songs out there.

Imagining a song in 4/4 with 120 qpm that changes chord every 4 bars, you would have a change every 8 seconds (quarter is 60s/120=0.5s, bar is 0.5s * 4 = 2s, chord length is 2s * 4 = 8s). So the ideal output would be for example:

0.0   8.0   E:min
8.0   16.0  A:maj
16.0  24.0  E:min
[.....]

In reality time offsets in the predictions are a bit wonky, that is probably because in real sound there is not really an exact time when a chord starts. I have also tested this on a .wav render of a midi.

If the tempo is low and bars are long then durations can be sort of quantised "with a wrench hit" by approximating to the closest bar, but when the tempo is high enough (100+?) the timing error becomes too big, making it impossible to pin exactly when in the score the chord is changed.

I don't know much about how your NN works, but perhaps this is because the wave is analysed "continuously"? could it be made to analyse segments that are aligned with bars instead? In the previous song, for example, could the prediction function be made to guess what chord is there from 0.0 to 2.0, then from 2.0 to 4.0, etc?

Thanks!

@cjbayron
Copy link
Owner

perhaps this is because the wave is analysed "continuously

yes, that's pretty much the idea. the NN is not "bar-aware", it simply chunks audio into ~50ms segments, which are further grouped together into chunks of ~6 second non-overlapping segments. Each 6 second segment is an input to the NN, which provides an output of a chord label for each 50ms segment of this 6-second group. Any contiguous chord label are merged together as label for a longer segment e.g. 20 contiguous label of Am is combined into a single Am label for a longer segment of 1000ms. You can say that we let these chord labels tell us where the bars are, instead of the other way around.

could it be made to analyse segments that are aligned with bars instead

this sounds like a nice improvement! one way I can think of is to use a Bar estimation algorithm (I don't have much experience with this, so I couldn't recommend one, but maybe popular libraries e.g. Essentia will have something of the sort), run the NN on the bar, then pick-out the most prominent chord from the NN output as the single chord label for the bar.

@xstasi
Copy link
Author

xstasi commented Mar 19, 2023

Hi CJ, thanks for your explanation :) it helped me understand your code a little bit better, allowing me to implement a basic version of bar based guessing.

Made PR #6, hope you will like it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants