-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quick (Singing) Voice Conversion #200
base: main
Are you sure you want to change the base?
Conversation
Hi @CuiLvYing, thanks for your efforts! Would you please attach some demos (such as the generated voices or your WebUI's video) like PR #56? |
Of course! Here are some test demo videos or audios. 1.mp42.mp4source.mp4https://github.com/open-mmlab/Amphion/assets/16 result.5.mp46400963/f752ea9d-a950-4831-bd30-ffd9fb6fd6f5 You can even have a look at our running demo webui now: https://24a8ca30d15dff216c.gradio.live |
Sorry I find the using target audio not uploaded. Here is it: target.mp4 |
Hi @CuiLvYing, I'm confused about your samples. For VC, the converted audio will speak the source's content with the target's timbre. Please use your model to convert the samples of PR: #201. Then we can compare yours :) |
I think we are attempting to make the person from "Infsource" speak content of the "target", and this is just opposite to your definition, and we'll soon amend this. source1.mp4target1.mp4result1.mp4source2.mp4target2.mp4result2.mp4source3.mp4target3.mp4result3.mp4 |
The naturalness, especically the intelligibility, is bad to me. So I recommend not to merge this PR unless there is a substantial improvement. @Adorable-Qin Please review the code and document carefully. |
β¨ Description
This is an implementation of a simple Webui which provides a simple and quick text-free one-shot voice conversion for the uninitiated. Thereotically, the user only takes two short audios (source and target) and a few minutes to receive the VC result.
It purposes to use the base model (checkpoint) trained from the VCTK, M4Singer datasets (or other supported datasets) as a foundation, and then fine-tune the base model using the input source audio for voice conversion and output. Now it supports MultipleContentSVC and VITS.
π§ Related Issues
None
π¨βπ» Changes Proposed
If exists, please refer to the commits.
π§βπ€βπ§ Who Can Review?
[Please use the '@' symbol to mention any community member who is free to review the PR once the tests have passed. Feel free to tag members or contributors who might be interested in your PR.]
@zhizhengwu @RMSnow @Adorable-Qin
π TODO
β Checklist