Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: can we generate 1sec/ms length audio? #3

Closed
wassimbj opened this issue Apr 28, 2023 · 4 comments
Closed

Question: can we generate 1sec/ms length audio? #3

wassimbj opened this issue Apr 28, 2023 · 4 comments

Comments

@wassimbj
Copy link

wassimbj commented Apr 28, 2023

The homepage shows only 10secs for all the audios, i want to know if the audio length is controllable ? Or there is a minimum as in audioldm, you can't generate less then 2.5 secs

@deepanwayx
Copy link
Collaborator

Do you mean trimming the audio to a desired length after generating a 10-second long sample? This is easily doable by truncating the generated wave in tango.py:

def generate(self, prompt, steps=100, guidance=3, samples=1, disable_progress=True, desired_length_in_seconds=10):
  """ Genrate audio for a single prompt string. """
  with torch.no_grad():
      latents = self.model.inference([prompt], self.scheduler, steps, guidance, samples, disable_progress=disable_progress)
      mel = self.vae.decode_first_stage(latents)
      wave = self.vae.decode_to_waveform(mel)
      # Sampling rate is 16 KHz
      wave = wave[:, desired_length_in_seconds * 16000]
  return wave[0]

However, constraining the generated audio such that the events described in the text appear within the first n seconds is not straightforward to control. The nature of the training dataset results in the generated audio having the events described in the text prompt being spread over the entire 10 seconds duration.

@wassimbj
Copy link
Author

yes, I meant to get the events within n seconds. do you mean if I trained it on a short-length audio files, I get short results too? what length should the dataset be in ur opinion? and what do you think should be done to control the length of the audio?

@wassimbj wassimbj changed the title Question: can we generate 1sec/ms sounds ? Question: can we generate 1sec/ms length audio? Apr 28, 2023
@deepanwayx
Copy link
Collaborator

You need to train on shorter audio samples to achieve the control. The duration variable in train.py specifies the length of the audio in seconds. It is set to 10 which you can reduce to a smaller number and train with appropriate short audio samples.

@wassimbj
Copy link
Author

wassimbj commented Apr 29, 2023

Thanks 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants