ADD CREPE #1201

        elif f0_method == "crepe":
            model = "full"
            # Pick a batch size that doesn't cause memory errors on your gpu
            batch_size = 512
            # Compute pitch using first gpu
            audio = torch.tensor(np.copy(x))[None].float()
            f0, pd = torchcrepe.predict(
                audio,
                self.sr,
                self.window,
                f0_min,
                f0_max,
                model,
                batch_size=batch_size,
                device=self.device,
                return_periodicity=True,
            )
            pd = torchcrepe.filter.median(pd, 3)
            f0 = torchcrepe.filter.mean(f0, 3)
            f0[pd < 0.1] = 0
            f0 = f0[0].cpu().numpy()

kyakuno · 2023-08-09T04:06:21Z

preprocessが意外と複雑かもしれない。
https://github.com/maxrmorrison/torchcrepe/blob/master/torchcrepe/core.py

kyakuno · 2023-08-09T05:09:15Z

preprocessで入力音声波形を16kHzにresample、hop_lengthでフレーム分割、inferでAIモデル推論してprobabilitiesを取得、postprocessでbinからf0値に変換。

kyakuno · 2023-08-09T05:21:40Z

rvc.pyのinp_f0は常にNone。

kyakuno · 2023-08-13T11:47:12Z

モデルをアップロード。
https://storage.googleapis.com/ailia-models/rvc/crepe.onnx
入力は[n, 1024]で出力は[m, 360]

kyakuno · 2023-08-13T11:59:05Z

1024要素につき、一つのfrequencyが出力される。
hop_sizeはsample_rate / 100の単位で計算されるので、10msごとにfrequencyを計算する。

kyakuno · 2023-08-13T12:22:17Z

paper
https://www.eecs.qmul.ac.uk/~simond/pub/2023/RileyDixon-SMC2023.pdf

kyakuno · 2023-08-13T12:23:03Z

Unityで動かすと想定よりも負荷が高かったので調べてみると、crepeは高音質用の模様。

Pitch extraction algorithm：
（ピッチ抽出アルゴリズム）
crepe
＊高品質の音声を処理するには、やや速度が遅くなりますが、「dio」を選択します。さらに品質向上で処理をしたい場合には、処理が遅くなりますが、「harvest」を選択します。品質を最高にしたい場合にはGPUに負荷がかかりますが「crepe」または「mangio-crepe」を選択します。

https://child-programmer.com/ai-voice-change-tutorial-ddpn/

kyakuno · 2023-08-13T12:25:39Z

harvestのアルゴリズム
http://www.isc.meiji.ac.jp/~mmorise/lab/publication/paper/SP2016-62.pdf

kyakuno · 2023-08-13T12:28:57Z

pmはparselmouthのto_pitch_acを使用している。
https://github.com/YannickJadoul/Parselmouth
https://github.com/YannickJadoul/Parselmouth/blob/85eef4b5245ccadb89cf6cf8f3e16e8662d5da61/praat/fon/Sound_to_Pitch.cpp#L554

kyakuno · 2023-08-13T12:36:49Z

crepeは2次元入力ではなく3次元入力にしてbatchを1に固定した方が高速な気がする。

kyakuno · 2023-08-14T00:58:49Z

batchサイズの内側にchannelがあるので3次元入力にしてconv2dにマップするのは難しそう。

kyakuno · 2023-08-14T02:40:14Z

crepeのargmax、weighted_argmax、viterbiを比較すると、viterbi以外はf値が飛ぶことがあり品質が低く、実際の音もノイズが入る。
f0と同時に出力されるpdは省略すると、無音部分で音が伸びたロボのような音になる。

kyakuno · 2023-08-14T02:42:38Z

360個の量子化されたf0値（bin）からどのようにf0を復元するかが違う。

argmax

def argmax(logits):
    """Sample observations by taking the argmax"""
    import torch
    bins = torch.from_numpy(logits).argmax(dim=1).numpy()
    print("bins", bins)

    # Convert to frequency in Hz
    return bins, bins_to_frequency(bins)

viterbi

def viterbi(logits):
    """Sample observations using viterbi decoding"""
    # Create viterbi transition matrix
    if not hasattr(viterbi, 'transition'):
        xx, yy = np.meshgrid(range(360), range(360))
        transition = np.maximum(12 - abs(xx - yy), 0)
        transition = transition / transition.sum(axis=1, keepdims=True)
        viterbi.transition = transition

    # Normalize logits
    sequences = softmax(logits, axis=1)

    # Perform viterbi decoding
    bins = np.array([
        librosa.sequence.viterbi(sequence, viterbi.transition).astype(np.int64)
        for sequence in sequences])

    # Convert to frequency in Hz
    return bins, bins_to_frequency(bins)

kyakuno · 2023-08-14T02:52:53Z

viterbiの場合、360x360のmeshgridを作り、同じbinは12、binの距離が離れるほど減点し、最小で0を付与する。
これを平均化したものを遷移コストとする。
logitsにsoftmaxを適用し、probs = [s,t] = (360, 512)を入力に、librosa.sequence.viterbiで遷移を計算する。
https://librosa.org/doc/main/generated/librosa.sequence.viterbi.html

kyakuno · 2023-08-14T02:55:51Z

argmaxの計算に遷移コストを付与するイメージ。最終的に最も確率の高い組み合わせからbackwardで経路を決定する。
https://github.com/librosa/librosa/blob/main/librosa/sequence.py

@jit(nopython=True, cache=True)  # type: ignore
def _viterbi(
    log_prob: np.ndarray, log_trans: np.ndarray, log_p_init: np.ndarray
) -> Tuple[np.ndarray, np.ndarray]:  # pragma: no cover
    """Core Viterbi algorithm.

    This is intended for internal use only.

    Parameters
    ----------
    log_prob : np.ndarray [shape=(T, m)]
        ``log_prob[t, s]`` is the conditional log-likelihood
        ``log P[X = X(t) | State(t) = s]``
    log_trans : np.ndarray [shape=(m, m)]
        The log transition matrix
        ``log_trans[i, j] = log P[State(t+1) = j | State(t) = i]``
    log_p_init : np.ndarray [shape=(m,)]
        log of the initial state distribution

    Returns
    -------
    None
        All computations are performed in-place on ``state, value, ptr``.
    """
    n_steps, n_states = log_prob.shape

    state = np.zeros(n_steps, dtype=np.uint16)
    value = np.zeros((n_steps, n_states), dtype=np.float64)
    ptr = np.zeros((n_steps, n_states), dtype=np.uint16)

    # factor in initial state distribution
    value[0] = log_prob[0] + log_p_init

    for t in range(1, n_steps):
        # Want V[t, j] <- p[t, j] * max_k V[t-1, k] * A[k, j]
        #    assume at time t-1 we were in state k
        #    transition k -> j

        # Broadcast over rows:
        #    Tout[k, j] = V[t-1, k] * A[k, j]
        #    then take the max over columns
        # We'll do this in log-space for stability

        trans_out = value[t - 1] + log_trans.T

        # Unroll the max/argmax loop to enable numba support
        for j in range(n_states):
            ptr[t, j] = np.argmax(trans_out[j])
            # value[t, j] = log_prob[t, j] + np.max(trans_out[j])
            value[t, j] = log_prob[t, j] + trans_out[j, ptr[t][j]]

    # Now roll backward

    # Get the last state
    state[-1] = np.argmax(value[-1])

    for t in range(n_steps - 2, -1, -1):
        state[t] = ptr[t + 1, state[t + 1]]

    logp = value[-1:, state[-1]]

    return state, logp

kyakuno · 2023-08-14T02:58:12Z

p_initの初期値。

        p_init = np.empty(n_states)
        p_init.fill(1.0 / n_states)

kyakuno · 2023-08-14T04:26:45Z

pdにはconfidence値が入る。
confidenceが0.1以下の場合にf0を0にする。

kyakuno · 2023-08-15T08:23:06Z

RVCにマージしました。

kyakuno self-assigned this Aug 8, 2023

kyakuno assigned ooe1123 and unassigned kyakuno Aug 8, 2023

kyakuno added the high priority label Aug 8, 2023

kyakuno mentioned this issue Aug 10, 2023

Implement RVC f0 mode axinc-ai/ailia-models-unity#100

Merged

ooe1123 mentioned this issue Aug 13, 2023

Implement Crepe #1205

Merged

kyakuno closed this as completed Aug 15, 2023

kyakuno mentioned this issue Aug 23, 2023

Added crepe #1218

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADD CREPE #1201

ADD CREPE #1201

kyakuno commented Aug 8, 2023 •

edited

Loading

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 9, 2023

kyakuno commented Aug 9, 2023

kyakuno commented Aug 9, 2023

kyakuno commented Aug 9, 2023

kyakuno commented Aug 13, 2023

kyakuno commented Aug 13, 2023

kyakuno commented Aug 13, 2023

kyakuno commented Aug 13, 2023 •

edited

Loading

kyakuno commented Aug 13, 2023

kyakuno commented Aug 13, 2023 •

edited

Loading

kyakuno commented Aug 13, 2023 •

edited

Loading

kyakuno commented Aug 14, 2023

kyakuno commented Aug 14, 2023 •

edited

Loading

kyakuno commented Aug 14, 2023

kyakuno commented Aug 14, 2023 •

edited

Loading

kyakuno commented Aug 14, 2023 •

edited

Loading

kyakuno commented Aug 14, 2023

kyakuno commented Aug 14, 2023 •

edited

Loading

kyakuno commented Aug 15, 2023

ADD CREPE #1201

ADD CREPE #1201

Comments

kyakuno commented Aug 8, 2023 • edited Loading

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 8, 2023

kyakuno commented Aug 9, 2023

kyakuno commented Aug 9, 2023

kyakuno commented Aug 9, 2023

kyakuno commented Aug 9, 2023

kyakuno commented Aug 13, 2023

kyakuno commented Aug 13, 2023

kyakuno commented Aug 13, 2023

kyakuno commented Aug 13, 2023 • edited Loading

kyakuno commented Aug 13, 2023

kyakuno commented Aug 13, 2023 • edited Loading

kyakuno commented Aug 13, 2023 • edited Loading

kyakuno commented Aug 14, 2023

kyakuno commented Aug 14, 2023 • edited Loading

kyakuno commented Aug 14, 2023

kyakuno commented Aug 14, 2023 • edited Loading

kyakuno commented Aug 14, 2023 • edited Loading

kyakuno commented Aug 14, 2023

kyakuno commented Aug 14, 2023 • edited Loading

kyakuno commented Aug 15, 2023

kyakuno commented Aug 8, 2023 •

edited

Loading

kyakuno commented Aug 13, 2023 •

edited

Loading

kyakuno commented Aug 13, 2023 •

edited

Loading

kyakuno commented Aug 13, 2023 •

edited

Loading

kyakuno commented Aug 14, 2023 •

edited

Loading

kyakuno commented Aug 14, 2023 •

edited

Loading

kyakuno commented Aug 14, 2023 •

edited

Loading

kyakuno commented Aug 14, 2023 •

edited

Loading