Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFT/Audio Post-processing as a Song Heuristic #86

Closed
stephen-huan opened this issue May 22, 2020 · 0 comments
Closed

FFT/Audio Post-processing as a Song Heuristic #86

stephen-huan opened this issue May 22, 2020 · 0 comments

Comments

@stephen-huan
Copy link
Contributor

Going along with the conversation in #81 and #39, while the song length heuristic is elegant and easy to program, the problem is remixes, which are rare but a possible problem. NLP is harder to program and probably error prone. In general, neither give a guarantee of true audio similarity.

Just adding another possible heuristic into the pot - combine song length with the Fast Fourier Transform (FFT). Numpy has an implementation, and the FFT can be used to directly compare two wave forms for similarity. The FFT can in fact be used to minimize the L2 norm between two integer arrays (the squared difference between the numbers at each index), I have an explanation here.

Although there is a large body of research trying to compute music similarity, I think a simple algorithm is sufficient in this case since the songs compared should be almost identical.

However, this likely introduces a non-intuitive extra parameter FFT_CUTOFF which would likely be experimentally determined (if the songs have a FFT value > FFT_CUTOFF, warn the user that the song found is likely incorrect).

Another algorithm than the FFT is fine, just something that deals with the actual audio.

In summary:
First, check the songs to make sure they have similar lengths.
Then, run a FFT over the songs, computing the L2 norms between the songs themselves.
If the value is > FFT_CUTOFF, pick another song or warn the user.

Pros:

  • Not an indirect heuristic, targets the exact thing we want (audio similarity)
  • Standard tool for wave analysis

Cons:

  • Could be slow depending on implementation (run async/in parallel?)
  • Not intuitive what the cutoff should be
  • More complex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant