Skip to content

[FastPitch] Why do you hierarchically predict the variance features (pitch and energy)? #1357

@changjinhan

Description

@changjinhan

Thank you always for sharing your thoughtful code.

As we can see in FastPitch code, you added the pitch embedding to encoder output before passing the energy predictor.

enc_out = enc_out + pitch_emb.transpose(1, 2)

Why did you chose the hierarchical variance feature prediction instead of parallel prediction like the FastSpeech2(paper version)?
Are there any performance advantages?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions