ADD ReazonSpeech #1014

kyakuno · 2023-01-23T05:04:29Z

日本語音声認識モデル
https://research.reazon.jp/projects/ReazonSpeech/

kyakuno · 2023-01-24T06:45:21Z

超高精度で商用利用可能な純国産の日本語音声認識モデル「ReazonSpeech」を無償公開
https://prtimes.jp/main/html/rd/p/000000003.000102162.html

kyakuno · 2023-01-24T06:49:56Z

espnet向けのモデルファイルが提供される形
https://research.reazon.jp/projects/ReazonSpeech/quickstart.html

$ git clone https://github.com/espnet/espnet
$ cd espnet/tools
$ ./setup_anaconda.sh anaconda espnet 3.8
$ make
$ . activate_python.sh
$ python3 decode.py speech-001.wav

気象庁は雪や路面の凍結による交通への影響暴風雪や
高波に警戒するとともに雪崩や屋根からの落雪にも
十分注意するよう呼びかけています

kyakuno · 2023-02-07T11:43:10Z

ReazonSpeechの実行方法
https://dev.classmethod.jp/articles/reazon-speech-transcribe-meeting/

ooe1123 · 2023-02-12T14:29:05Z

onnxエクスポートのstft対応
pytorch/pytorch#92087
https://github.com/urinieto/pytorch/tree/stft_onnx2

ooe1123 · 2023-02-12T14:46:26Z

○ encoder

espnet2/bin/asr_inference.py

class Speech2Text:
    ...
    @torch.no_grad()
    def __call__(
        self, speech: Union[torch.Tensor, np.ndarray]
    ):
        ...
        # b. Forward Encoder
        enc, _ = self.asr_model.encode(**batch)

↓

class Speech2Text:
    ...
    @torch.no_grad()
    def __call__(
        self, speech: Union[torch.Tensor, np.ndarray]
    ):
        ...
        # b. Forward Encoder
        print("------>")
        from torch.autograd import Variable
        xx = (Variable(speech), Variable(lengths))
        self.asr_model.forward = self.asr_model.encode
        torch.onnx.export(
            self.asr_model, xx, 'xxx.onnx',
            input_names=["speech", "lengths"],
            output_names=["encoder_out", "encoder_out_lens"],
            dynamic_axes={'speech' : {1 : 'length'}},
            verbose=False, opset_version=17
        )
        print("<------")

espnet/nets/pytorch_backend/nets_utils.py

def make_pad_mask(lengths, xs=None, length_dim=-1, maxlen=None):
    ...
    if not isinstance(lengths, list):
        lengths = lengths.long().tolist()

    bs = int(len(lengths))
    if maxlen is None:
        if xs is None:
            maxlen = int(max(lengths))
        else:
            ...

↓

def make_pad_mask(lengths, xs=None, length_dim=-1, maxlen=None):
    ...
    # if not isinstance(lengths, list):
        # lengths = lengths.long().tolist()

    bs = lengths.size(0)
    if maxlen is None:
        if xs is None:
            maxlen = lengths.max()
        else:
            ...

espnet2/layers/stft.py

class Stft(torch.nn.Module, InversibleInterface):
    def forward(
        self, input: torch.Tensor, ilens: torch.Tensor = None
    ):
        ...
        if input.is_cuda or torch.backends.mkl.is_available():

↓

class Stft(torch.nn.Module, InversibleInterface):
    def forward(
        self, input: torch.Tensor, ilens: torch.Tensor = None
    ):
        ...
        if 1 or input.is_cuda or torch.backends.mkl.is_available():

ooe1123 · 2023-02-12T14:53:12Z

○ decoder

espnet2/asr/decoder/transformer_decoder.py

class BaseTransformerDecoder(AbsDecoder, BatchScorerInterface):
    def batch_score(
        self, ys: torch.Tensor, states: List[Any], xs: torch.Tensor
    ):
        if states[0] is None:
            batch_state = None
        else:
            ...
        ...
        logp, states = self.forward_one_step(ys, ys_mask, xs, cache=batch_state)

↓

class BaseTransformerDecoder(AbsDecoder, BatchScorerInterface):
    def batch_score(
        self, ys: torch.Tensor, states: List[Any], xs: torch.Tensor
    ):
        if states[0] is None:
            batch_state = [
                torch.stack([torch.zeros(0, 512) for b in range(n_batch)])
                for i in range(n_layers)
            ]
        else:
            ...
        ...
        print("------>")
        from torch.autograd import Variable
        xx = (Variable(ys), Variable(ys_mask), Variable(xs), Variable(batch_state[0]), Variable(batch_state[1]), Variable(batch_state[2]), Variable(batch_state[3]), Variable(batch_state[4]), Variable(batch_state[5]))
        self.forward = self.forward_one_step2
        torch.onnx.export(
            self, xx, 'xxx.onnx',
            input_names=["tgt", "tgt_mask", "memory", "cache1", "cache2", "cache3", "cache4", "cache5", "cache6"],
            output_names=["y", "new_cache1", "new_cache2", "new_cache3", "new_cache4", "new_cache5", "new_cache6"],
            dynamic_axes={'tgt': [0, 1], 'tgt_mask': [1, 2], 'memory': [0,1], 
                'cache1': [0, 1], 'cache2': [0, 1], 'cache3': [0, 1], 
                'cache4': [0, 1], 'cache5': [0, 1], 'cache6': [0, 1],
                'y': [0], 'new_cache1': [0, 1], 'new_cache2': [0, 1], 'new_cache3': [0, 1], 
                'new_cache4': [0, 1], 'new_cache5': [0, 1], 'new_cache6': [0, 1]},
            verbose=False, opset_version=11
        )
        print("<------")

    def forward_one_step2(self, tgt, tgt_mask, memory, cache1, cache2, cache3, cache4, cache5, cache6):
        cache = [
            cache1, cache2, cache3, cache4, cache5, cache6
        ]
        return self.forward_one_step(tgt, tgt_mask, memory, cache=cache)

ooe1123 · 2023-02-12T14:58:55Z

○ lm

espnet2/lm/seq_rnn_lm.py

class SequentialRNNLM(AbsLM):
    ...
    def batch_score(
        self, ys: torch.Tensor, states: torch.Tensor, xs: torch.Tensor
    ):
        ...
        ys, states = self(ys[:, -1:], states)

↓

class SequentialRNNLM(AbsLM):
    ...
    def batch_score(
        self, ys: torch.Tensor, states: torch.Tensor, xs: torch.Tensor
    ):
        ...
        if states is not None:
            print("nhid---", self.nhid)
            print("------>")
            from torch.autograd import Variable
            xx = (Variable(ys[:, -1:]), Variable(states[0]), Variable(states[1]))
            self._forward = self.forward
            self.forward = self.forward2
            torch.onnx.export(
                self, xx, 'seq_rnn_lm.onnx',
                input_names=["input", "h_0", "c_0"],
                output_names=["output", "h_n", "c_n"],
                dynamic_axes={'input' : [0], 'h_0' : [1], 'c_0' : [1], 
                    'output' : [0], 'h_n' : [1], 'c_n' : [1]},
                verbose=False, opset_version=11
            )
            print("<------")

    def forward2(
        self, input: torch.Tensor, h: torch.Tensor, c: torch.Tensor
    ):
        hidden = (h, c)
        output, states = self._forward(input, hidden)
        return (output, states[0], states[1])

ooe1123 · 2023-02-12T15:02:44Z

○ ctc

espnet/nets/scorers/ctc.py

class CTCPrefixScorer(BatchPartialScorerInterface):
    ...
    def batch_init_state(self, x: torch.Tensor):
        ...
        logp = self.ctc.log_softmax(x.unsqueeze(0))  # assuming batch_size = 1

↓

class CTCPrefixScorer(BatchPartialScorerInterface):
    ...
    def batch_init_state(self, x: torch.Tensor):
        ...
        print("------>")
        from torch.autograd import Variable
        x = Variable(x.unsqueeze(0))
        self.ctc.forward = self.ctc.log_softmax
        torch.onnx.export(
            self.ctc, x, 'xxx.onnx',
            input_names=["hs_pad"],
            output_names=["logp"],
            dynamic_axes={'hs_pad': [1], 'logp': [1]},
            verbose=False, opset_version=11
        )
        print("<------")

kyakuno added the high priority label Jan 24, 2023

ooe1123 self-assigned this Jan 26, 2023

ooe1123 mentioned this issue Feb 12, 2023

Implement ReazonSpeech #1040

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADD ReazonSpeech #1014

ADD ReazonSpeech #1014

kyakuno commented Jan 23, 2023 •

edited

kyakuno commented Jan 24, 2023 •

edited

kyakuno commented Jan 24, 2023

kyakuno commented Feb 7, 2023

ooe1123 commented Feb 12, 2023

ooe1123 commented Feb 12, 2023

ooe1123 commented Feb 12, 2023

ooe1123 commented Feb 12, 2023

ooe1123 commented Feb 12, 2023

ADD ReazonSpeech #1014

ADD ReazonSpeech #1014

Comments

kyakuno commented Jan 23, 2023 • edited

kyakuno commented Jan 24, 2023 • edited

kyakuno commented Jan 24, 2023

kyakuno commented Feb 7, 2023

ooe1123 commented Feb 12, 2023

ooe1123 commented Feb 12, 2023

ooe1123 commented Feb 12, 2023

ooe1123 commented Feb 12, 2023

ooe1123 commented Feb 12, 2023

kyakuno commented Jan 23, 2023 •

edited

kyakuno commented Jan 24, 2023 •

edited