Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD ReazonSpeech #1014

Open
kyakuno opened this issue Jan 23, 2023 · 8 comments
Open

ADD ReazonSpeech #1014

kyakuno opened this issue Jan 23, 2023 · 8 comments
Assignees

Comments

@kyakuno
Copy link
Collaborator

kyakuno commented Jan 23, 2023

日本語音声認識モデル
https://research.reazon.jp/projects/ReazonSpeech/

@kyakuno
Copy link
Collaborator Author

kyakuno commented Jan 24, 2023

超高精度で商用利用可能な純国産の日本語音声認識モデル「ReazonSpeech」を無償公開
https://prtimes.jp/main/html/rd/p/000000003.000102162.html

@kyakuno
Copy link
Collaborator Author

kyakuno commented Jan 24, 2023

espnet向けのモデルファイルが提供される形
https://research.reazon.jp/projects/ReazonSpeech/quickstart.html

$ git clone https://github.com/espnet/espnet
$ cd espnet/tools
$ ./setup_anaconda.sh anaconda espnet 3.8
$ make
$ . activate_python.sh
$ python3 decode.py speech-001.wav

気象庁は雪や路面の凍結による交通への影響暴風雪や
高波に警戒するとともに雪崩や屋根からの落雪にも
十分注意するよう呼びかけています

@ooe1123 ooe1123 self-assigned this Jan 26, 2023
@kyakuno
Copy link
Collaborator Author

kyakuno commented Feb 7, 2023

@ooe1123
Copy link
Contributor

ooe1123 commented Feb 12, 2023

onnxエクスポートのstft対応
pytorch/pytorch#92087
https://github.com/urinieto/pytorch/tree/stft_onnx2

@ooe1123
Copy link
Contributor

ooe1123 commented Feb 12, 2023

○ encoder

  • espnet2/bin/asr_inference.py
class Speech2Text:
    ...
    @torch.no_grad()
    def __call__(
        self, speech: Union[torch.Tensor, np.ndarray]
    ):
        ...
        # b. Forward Encoder
        enc, _ = self.asr_model.encode(**batch)

class Speech2Text:
    ...
    @torch.no_grad()
    def __call__(
        self, speech: Union[torch.Tensor, np.ndarray]
    ):
        ...
        # b. Forward Encoder
        print("------>")
        from torch.autograd import Variable
        xx = (Variable(speech), Variable(lengths))
        self.asr_model.forward = self.asr_model.encode
        torch.onnx.export(
            self.asr_model, xx, 'xxx.onnx',
            input_names=["speech", "lengths"],
            output_names=["encoder_out", "encoder_out_lens"],
            dynamic_axes={'speech' : {1 : 'length'}},
            verbose=False, opset_version=17
        )
        print("<------")
  • espnet/nets/pytorch_backend/nets_utils.py
def make_pad_mask(lengths, xs=None, length_dim=-1, maxlen=None):
    ...
    if not isinstance(lengths, list):
        lengths = lengths.long().tolist()

    bs = int(len(lengths))
    if maxlen is None:
        if xs is None:
            maxlen = int(max(lengths))
        else:
            ...

def make_pad_mask(lengths, xs=None, length_dim=-1, maxlen=None):
    ...
    # if not isinstance(lengths, list):
        # lengths = lengths.long().tolist()

    bs = lengths.size(0)
    if maxlen is None:
        if xs is None:
            maxlen = lengths.max()
        else:
            ...
  • espnet2/layers/stft.py
class Stft(torch.nn.Module, InversibleInterface):
    def forward(
        self, input: torch.Tensor, ilens: torch.Tensor = None
    ):
        ...
        if input.is_cuda or torch.backends.mkl.is_available():

class Stft(torch.nn.Module, InversibleInterface):
    def forward(
        self, input: torch.Tensor, ilens: torch.Tensor = None
    ):
        ...
        if 1 or input.is_cuda or torch.backends.mkl.is_available():

@ooe1123
Copy link
Contributor

ooe1123 commented Feb 12, 2023

○ decoder

  • espnet2/asr/decoder/transformer_decoder.py
class BaseTransformerDecoder(AbsDecoder, BatchScorerInterface):
    def batch_score(
        self, ys: torch.Tensor, states: List[Any], xs: torch.Tensor
    ):
        if states[0] is None:
            batch_state = None
        else:
            ...
        ...
        logp, states = self.forward_one_step(ys, ys_mask, xs, cache=batch_state)

class BaseTransformerDecoder(AbsDecoder, BatchScorerInterface):
    def batch_score(
        self, ys: torch.Tensor, states: List[Any], xs: torch.Tensor
    ):
        if states[0] is None:
            batch_state = [
                torch.stack([torch.zeros(0, 512) for b in range(n_batch)])
                for i in range(n_layers)
            ]
        else:
            ...
        ...
        print("------>")
        from torch.autograd import Variable
        xx = (Variable(ys), Variable(ys_mask), Variable(xs), Variable(batch_state[0]), Variable(batch_state[1]), Variable(batch_state[2]), Variable(batch_state[3]), Variable(batch_state[4]), Variable(batch_state[5]))
        self.forward = self.forward_one_step2
        torch.onnx.export(
            self, xx, 'xxx.onnx',
            input_names=["tgt", "tgt_mask", "memory", "cache1", "cache2", "cache3", "cache4", "cache5", "cache6"],
            output_names=["y", "new_cache1", "new_cache2", "new_cache3", "new_cache4", "new_cache5", "new_cache6"],
            dynamic_axes={'tgt': [0, 1], 'tgt_mask': [1, 2], 'memory': [0,1], 
                'cache1': [0, 1], 'cache2': [0, 1], 'cache3': [0, 1], 
                'cache4': [0, 1], 'cache5': [0, 1], 'cache6': [0, 1],
                'y': [0], 'new_cache1': [0, 1], 'new_cache2': [0, 1], 'new_cache3': [0, 1], 
                'new_cache4': [0, 1], 'new_cache5': [0, 1], 'new_cache6': [0, 1]},
            verbose=False, opset_version=11
        )
        print("<------")

    def forward_one_step2(self, tgt, tgt_mask, memory, cache1, cache2, cache3, cache4, cache5, cache6):
        cache = [
            cache1, cache2, cache3, cache4, cache5, cache6
        ]
        return self.forward_one_step(tgt, tgt_mask, memory, cache=cache)

@ooe1123
Copy link
Contributor

ooe1123 commented Feb 12, 2023

○ lm

  • espnet2/lm/seq_rnn_lm.py
class SequentialRNNLM(AbsLM):
    ...
    def batch_score(
        self, ys: torch.Tensor, states: torch.Tensor, xs: torch.Tensor
    ):
        ...
        ys, states = self(ys[:, -1:], states)

class SequentialRNNLM(AbsLM):
    ...
    def batch_score(
        self, ys: torch.Tensor, states: torch.Tensor, xs: torch.Tensor
    ):
        ...
        if states is not None:
            print("nhid---", self.nhid)
            print("------>")
            from torch.autograd import Variable
            xx = (Variable(ys[:, -1:]), Variable(states[0]), Variable(states[1]))
            self._forward = self.forward
            self.forward = self.forward2
            torch.onnx.export(
                self, xx, 'seq_rnn_lm.onnx',
                input_names=["input", "h_0", "c_0"],
                output_names=["output", "h_n", "c_n"],
                dynamic_axes={'input' : [0], 'h_0' : [1], 'c_0' : [1], 
                    'output' : [0], 'h_n' : [1], 'c_n' : [1]},
                verbose=False, opset_version=11
            )
            print("<------")

    def forward2(
        self, input: torch.Tensor, h: torch.Tensor, c: torch.Tensor
    ):
        hidden = (h, c)
        output, states = self._forward(input, hidden)
        return (output, states[0], states[1])

@ooe1123
Copy link
Contributor

ooe1123 commented Feb 12, 2023

○ ctc

  • espnet/nets/scorers/ctc.py
class CTCPrefixScorer(BatchPartialScorerInterface):
    ...
    def batch_init_state(self, x: torch.Tensor):
        ...
        logp = self.ctc.log_softmax(x.unsqueeze(0))  # assuming batch_size = 1

class CTCPrefixScorer(BatchPartialScorerInterface):
    ...
    def batch_init_state(self, x: torch.Tensor):
        ...
        print("------>")
        from torch.autograd import Variable
        x = Variable(x.unsqueeze(0))
        self.ctc.forward = self.ctc.log_softmax
        torch.onnx.export(
            self.ctc, x, 'xxx.onnx',
            input_names=["hs_pad"],
            output_names=["logp"],
            dynamic_axes={'hs_pad': [1], 'logp': [1]},
            verbose=False, opset_version=11
        )
        print("<------")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants