Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak augmentation settings. #2

Open
vigliensoni opened this issue Aug 4, 2022 · 1 comment
Open

Tweak augmentation settings. #2

vigliensoni opened this issue Aug 4, 2022 · 1 comment

Comments

@vigliensoni
Copy link

Hi Antoine,

I've been digging into this utility to prepare datasets for RAVE.

I've found a few things that I'd like to share with you so that we can improve it:

  1. Just using resample without any data augmentation raises the level by 3.8 dB, approximately. I tested this by doing common phase cancellation (i.e., aligning the original and the resampled signals, inverting the phase of one of them, and adjusting the level until they cancel each other). This might not be a big deal for many audio files but if we are using mastered files, or any file peaking above -3dB, we will end up with a clipped signal.
  2. The audio compression of the augmented signal is a nice addition. However, it also pumps the level of zones with soft background noise a lot. When playing live with RAVE I've found that there are plenty of zones like these, which is very annoying. The augmented version also fades in/out at the start and end of the audio file, which is unexpected (but not problematic, though)

The following image shows the same snippet of audio. The one on top is the original, the one in the middle is the resample version, and the bottom one is the augmented one.

Screen Shot 2022-08-04 at 11 52 39 AM

The silent zone in the one on top peaks at -41dB, while the augmented version peaks at -7dB.

Don't you think this type of compression is a bit aggressive? Perhaps we can also benefit from also implementing expansion for signals like these?

Thank you so much,

Gabriel

@vigliensoni
Copy link
Author

For the first issue, the ffmpeg dynaudnorm flag is taking care of normalizing the audio to a certain level.

cmd += "-af \"dynaudnorm, silenceremove=stop_periods=-1:stop_duration=1:stop_threshold=-60dB\" "

This can be desired sometimes, but for others this may not be desired. It could be a flag, especially considering that this happens just by using the resample command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant