Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD CREPE #1201

Closed
kyakuno opened this issue Aug 8, 2023 · 27 comments
Closed

ADD CREPE #1201

kyakuno opened this issue Aug 8, 2023 · 27 comments
Assignees

Comments

@kyakuno
Copy link
Collaborator

kyakuno commented Aug 8, 2023

CNNによるPCMからのピッチ推定
https://github.com/marl/crepe

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 8, 2023

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 8, 2023

onnx
marl/crepe#96

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 8, 2023

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 8, 2023

このコードをworldと比較してみると良さそう。
https://github.com/yqzhishen/onnxcrepe/blob/main/samples/demo.py

@kyakuno kyakuno self-assigned this Aug 8, 2023
@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 8, 2023

rvcもcrepeを選択できる。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 8, 2023

@kyakuno kyakuno assigned ooe1123 and unassigned kyakuno Aug 8, 2023
@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 8, 2023

@ooe1123 RVCのf0にcrapeを選択可能にしたいと考えていまして、RVCに含まれるcrapeをonnxにエクスポートして、RVCのサンプルに追加いただくことは可能でしょうか?

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 8, 2023

目的としては、Unityのf0対応のために、worldを使わないピッチ推定を導入したいと考えています。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 9, 2023

crepeの推論コード。

        elif f0_method == "crepe":
            model = "full"
            # Pick a batch size that doesn't cause memory errors on your gpu
            batch_size = 512
            # Compute pitch using first gpu
            audio = torch.tensor(np.copy(x))[None].float()
            f0, pd = torchcrepe.predict(
                audio,
                self.sr,
                self.window,
                f0_min,
                f0_max,
                model,
                batch_size=batch_size,
                device=self.device,
                return_periodicity=True,
            )
            pd = torchcrepe.filter.median(pd, 3)
            f0 = torchcrepe.filter.mean(f0, 3)
            f0[pd < 0.1] = 0
            f0 = f0[0].cpu().numpy()

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 9, 2023

preprocessが意外と複雑かもしれない。
https://github.com/maxrmorrison/torchcrepe/blob/master/torchcrepe/core.py

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 9, 2023

preprocessで入力音声波形を16kHzにresample、hop_lengthでフレーム分割、inferでAIモデル推論してprobabilitiesを取得、postprocessでbinからf0値に変換。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 9, 2023

rvc.pyのinp_f0は常にNone。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 13, 2023

モデルをアップロード。
https://storage.googleapis.com/ailia-models/rvc/crepe.onnx
入力は[n, 1024]で出力は[m, 360]

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 13, 2023

1024要素につき、一つのfrequencyが出力される。
hop_sizeはsample_rate / 100の単位で計算されるので、10msごとにfrequencyを計算する。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 13, 2023

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 13, 2023

Unityで動かすと想定よりも負荷が高かったので調べてみると、crepeは高音質用の模様。

Pitch extraction algorithm:
(ピッチ抽出アルゴリズム)
crepe
*高品質の音声を処理するには、やや速度が遅くなりますが、「dio」を選択します。さらに品質向上で処理をしたい場合には、処理が遅くなりますが、「harvest」を選択します。品質を最高にしたい場合にはGPUに負荷がかかりますが「crepe」または「mangio-crepe」を選択します。

https://child-programmer.com/ai-voice-change-tutorial-ddpn/

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 13, 2023

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 13, 2023

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 13, 2023

crepeは2次元入力ではなく3次元入力にしてbatchを1に固定した方が高速な気がする。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 14, 2023

batchサイズの内側にchannelがあるので3次元入力にしてconv2dにマップするのは難しそう。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 14, 2023

crepeのargmax、weighted_argmax、viterbiを比較すると、viterbi以外はf値が飛ぶことがあり品質が低く、実際の音もノイズが入る。
f0と同時に出力されるpdは省略すると、無音部分で音が伸びたロボのような音になる。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 14, 2023

360個の量子化されたf0値(bin)からどのようにf0を復元するかが違う。

argmax

def argmax(logits):
    """Sample observations by taking the argmax"""
    import torch
    bins = torch.from_numpy(logits).argmax(dim=1).numpy()
    print("bins", bins)

    # Convert to frequency in Hz
    return bins, bins_to_frequency(bins)

viterbi

def viterbi(logits):
    """Sample observations using viterbi decoding"""
    # Create viterbi transition matrix
    if not hasattr(viterbi, 'transition'):
        xx, yy = np.meshgrid(range(360), range(360))
        transition = np.maximum(12 - abs(xx - yy), 0)
        transition = transition / transition.sum(axis=1, keepdims=True)
        viterbi.transition = transition

    # Normalize logits
    sequences = softmax(logits, axis=1)

    # Perform viterbi decoding
    bins = np.array([
        librosa.sequence.viterbi(sequence, viterbi.transition).astype(np.int64)
        for sequence in sequences])

    # Convert to frequency in Hz
    return bins, bins_to_frequency(bins)

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 14, 2023

viterbiの場合、360x360のmeshgridを作り、同じbinは12、binの距離が離れるほど減点し、最小で0を付与する。
これを平均化したものを遷移コストとする。
logitsにsoftmaxを適用し、probs = [s,t] = (360, 512)を入力に、librosa.sequence.viterbiで遷移を計算する。
https://librosa.org/doc/main/generated/librosa.sequence.viterbi.html

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 14, 2023

argmaxの計算に遷移コストを付与するイメージ。最終的に最も確率の高い組み合わせからbackwardで経路を決定する。
https://github.com/librosa/librosa/blob/main/librosa/sequence.py

@jit(nopython=True, cache=True)  # type: ignore
def _viterbi(
    log_prob: np.ndarray, log_trans: np.ndarray, log_p_init: np.ndarray
) -> Tuple[np.ndarray, np.ndarray]:  # pragma: no cover
    """Core Viterbi algorithm.

    This is intended for internal use only.

    Parameters
    ----------
    log_prob : np.ndarray [shape=(T, m)]
        ``log_prob[t, s]`` is the conditional log-likelihood
        ``log P[X = X(t) | State(t) = s]``
    log_trans : np.ndarray [shape=(m, m)]
        The log transition matrix
        ``log_trans[i, j] = log P[State(t+1) = j | State(t) = i]``
    log_p_init : np.ndarray [shape=(m,)]
        log of the initial state distribution

    Returns
    -------
    None
        All computations are performed in-place on ``state, value, ptr``.
    """
    n_steps, n_states = log_prob.shape

    state = np.zeros(n_steps, dtype=np.uint16)
    value = np.zeros((n_steps, n_states), dtype=np.float64)
    ptr = np.zeros((n_steps, n_states), dtype=np.uint16)

    # factor in initial state distribution
    value[0] = log_prob[0] + log_p_init

    for t in range(1, n_steps):
        # Want V[t, j] <- p[t, j] * max_k V[t-1, k] * A[k, j]
        #    assume at time t-1 we were in state k
        #    transition k -> j

        # Broadcast over rows:
        #    Tout[k, j] = V[t-1, k] * A[k, j]
        #    then take the max over columns
        # We'll do this in log-space for stability

        trans_out = value[t - 1] + log_trans.T

        # Unroll the max/argmax loop to enable numba support
        for j in range(n_states):
            ptr[t, j] = np.argmax(trans_out[j])
            # value[t, j] = log_prob[t, j] + np.max(trans_out[j])
            value[t, j] = log_prob[t, j] + trans_out[j, ptr[t][j]]

    # Now roll backward

    # Get the last state
    state[-1] = np.argmax(value[-1])

    for t in range(n_steps - 2, -1, -1):
        state[t] = ptr[t + 1, state[t + 1]]

    logp = value[-1:, state[-1]]

    return state, logp

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 14, 2023

p_initの初期値。

        p_init = np.empty(n_states)
        p_init.fill(1.0 / n_states)

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 14, 2023

pdにはconfidence値が入る。
confidenceが0.1以下の場合にf0を0にする。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Aug 15, 2023

RVCにマージしました。

@kyakuno kyakuno closed this as completed Aug 15, 2023
@kyakuno kyakuno mentioned this issue Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants