Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

找到一种固定音色的方法 #123

Open
kunzhengstart opened this issue May 31, 2024 · 7 comments
Open

找到一种固定音色的方法 #123

kunzhengstart opened this issue May 31, 2024 · 7 comments
Labels
stale The topic has been ignored for a long time

Comments

@kunzhengstart
Copy link

具体思路:
1. 生成rand_spk并把weight保存到csv文件:

import torch
import csv
std, mean = torch.load("spk_stat.pt').chunk(2)

rand_spk = torch.randn(768) * std + mean
writeToCsv(f"saved.csv",rand_spk.detach().numpy())

def writeToCsv(csv_file_path,data):
  with open(csv_file_path, mode='w', newline='') as file:
    writer = csv.writer(file)
    # 写入数据
    writer.writerow(data.tolist())

2. 生成语音时加载保存的weight,注意temperature要设成一个极小的值

import pandas as pd

data = pd.read_csv(f"./saved.csv", header=None)
rand_spk = torch.tensor(data.iloc[0], dtype=torch.float32)
params_infer_code = {
    'spk_emb': rand_spk,  # add sampled speaker
    'temperature': .000000000001,  # using custom temperature
    'top_P': 0.7,  # top P decode
    'top_K': 20,  # top K decode
  }
params_refine_text = {
  'prompt': '[break_2]'
}
wavs = chat.infer("你的文本"
                      , params_refine_text=params_refine_text, params_infer_code=params_infer_code, use_decoder=True)
scipy.io.wavfile.write(filename=f"./chattts_download.wav", rate=24_000, data=wavs[0].T)
@ZaymeShaw
Copy link
Contributor

调温度能完全固定吗,还是只能固定一个大概范围内的音色

@kunzhengstart
Copy link
Author

voices.zip
4段人声效果

@ddkwing
Copy link

ddkwing commented May 31, 2024

voices.zip 4段人声效果

确实不错啊。音色相对比较一致了。

@fastfading
Copy link

能调出萌妹的音色吗
嗲嗲的

@glovebx
Copy link

glovebx commented May 31, 2024

连续4个文本,比较了一下后3段跟第1段还是会有比较明显的差别

@fastfading
Copy link

音色都不能固定能干啥,

@halfong
Copy link

halfong commented Jun 14, 2024

自己尝试了,固定以下参数可以基本固定音色:

{
    "text_seed_input": 87067822,
    "audio_seed_input": 78448590,
    "params_refine_text": {
        "prompt": "[oral_2][laugh_0][break_1][speed_4]"
    },
    "enable_refine_text": true,
    "lang": "en",
    "temperature": 0.3,
    "top_P": 0.005,
    "top_K": 1
}
```

@github-actions github-actions bot added the stale The topic has been ignored for a long time label Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The topic has been ignored for a long time
Projects
None yet
Development

No branches or pull requests

7 participants
@glovebx @halfong @ddkwing @fastfading @kunzhengstart @ZaymeShaw and others