-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: can we generate 1sec/ms length audio? #3
Comments
Do you mean trimming the audio to a desired length after generating a 10-second long sample? This is easily doable by truncating the generated wave in def generate(self, prompt, steps=100, guidance=3, samples=1, disable_progress=True, desired_length_in_seconds=10):
""" Genrate audio for a single prompt string. """
with torch.no_grad():
latents = self.model.inference([prompt], self.scheduler, steps, guidance, samples, disable_progress=disable_progress)
mel = self.vae.decode_first_stage(latents)
wave = self.vae.decode_to_waveform(mel)
# Sampling rate is 16 KHz
wave = wave[:, desired_length_in_seconds * 16000]
return wave[0] However, constraining the generated audio such that the events described in the text appear within the first |
yes, I meant to get the events within n seconds. do you mean if I trained it on a short-length audio files, I get short results too? what length should the dataset be in ur opinion? and what do you think should be done to control the length of the audio? |
You need to train on shorter audio samples to achieve the control. The |
Thanks 😁 |
The homepage shows only 10secs for all the audios, i want to know if the audio length is controllable ? Or there is a minimum as in audioldm, you can't generate less then 2.5 secs
The text was updated successfully, but these errors were encountered: