Training HiFIGAN for XTTSv2 #3659

JahidBasher · 2024-04-02T21:15:29Z

JahidBasher
Apr 2, 2024

As per my understanding, hifigan_decoder use gpt_latent and speaker_embedding to convert mel_spec to audio signal. But hifigan was possibly trained on audio segment instead of full audio. This necessitates having audio segment to align with gpt_latent for loss calculation. How can I align it? I gave coded the following snippet for hifigan dataloader. Is it the correct way to do it?

# in dataloader
def load_item(self, idx, gpt_latent):
        if self.compute_feat:
            wavpath = self.item_list[idx]

            audio = self.ap.load_wav(wavpath)
            if len(audio) < self.seq_len:
                audio = np.pad(audio, (0, self.seq_len - len(audio)), mode="constant", constant_values=0.0)

        if gpt_latent.shape[0] < self.feat_frame_len:
            padding = (0, 0, self.feat_frame_len - gpt_latent.shape[0], 0)  # (left, right, top, bottom)
            gpt_latent = torch.nn.functional.pad(gpt_latent, padding)

        audio = torch.from_numpy(audio).float().unsqueeze(0)

        if self.return_segments:
            max_gpt_latent_start = gpt_latent.shape[0] - self.feat_frame_len
            gpt_latent_start = random.randint(0, max_gpt_latent_start)
            gpt_latent_end = gpt_latent_start + self.feat_frame_len

            gpt_latent = gpt_latent[gpt_latent_start:gpt_latent_end, :]

            # ar_mel_length_compression, output_hop_length = 1024, 256
            # scale = ar_mel_length_compression//output_hop_length

            audio_start = gpt_latent_start * self.hop_len
            audio = audio[:, audio_start : audio_start + self.seq_len]

        if self.use_noise_augment and self.is_training and self.return_segments:
            audio = audio + (1 / 32768) * torch.randn_like(audio)
        return (torch.ones((1,1)), audio, gpt_latent)  # (torch.ones((1,1)) is just a placeholder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training HiFIGAN for XTTSv2 #3659

{{title}}

Replies: 0 comments

Select a reply

Training HiFIGAN for XTTSv2 #3659

JahidBasher Apr 2, 2024

Replies: 0 comments

JahidBasher
Apr 2, 2024