Availability of OWSM-CTC #5683

wgb14 · 2024-02-27T07:48:30Z

Hi espnet team,

Thank you for your amazing work on OWSM, this greatly helps the open-source community. Truly grateful for your efforts.

I assume this repo is a proper place to discuss OWSM related stuff, and really would like to know if you are planning to release OWSM-CTC recipes and models here as well.

By the way, I'm curious about where I can keep myself updated on OWSM. For now, I'm keeping an eye on issues and PRs in this repo, and papers from your lab page. I even have to get the latest OWSM-CTC paper from google scholar.

thanks in advance

sw005320 · 2024-02-27T12:02:16Z

Thanks a lot.
Yes, we plan to add it soon.
We will also make this webpage up-to-date https://www.wavlab.org/activities/2024/owsm/

@pyf98, can you make a PR for OWSM CTC and add the information to our webpage?

pyf98 · 2024-02-27T16:19:51Z

Hi @wgb14 , thanks for your interest in our work!

As Shinji shared, I have prepared that webpage to collect OWSM related papers and models. However, we sometimes cannot update it immediately due to some anonymity considerations in certain venues.

For OWSM-CTC, it will take some time to merge it into the master branch. But I do have the model and code public now:

Code in my repo: https://github.com/pyf98/espnet/tree/owsm-ctc
Current model on HF: https://huggingface.co/pyf98/owsm_ctc_v3.1_1B

An example script to run short-form ASR/ST:

import soundfile as sf
import numpy as np
import librosa
import kaldiio
from espnet2.bin.s2t_inference_ctc import Speech2TextGreedySearch


s2t = Speech2TextGreedySearch.from_pretrained(
    "pyf98/owsm_ctc_v3.1_1B",
    device="cuda",
    generate_interctc_outputs=False,
    lang_sym='<eng>',
    task_sym='<asr>',
)

speech, rate = sf.read(
    "xxx.wav"
)
speech = librosa.util.fix_length(speech, size=(16000 * 30))

res = s2t(speech)[0]
print(res)

An example script to run long-form ASR:

import soundfile as sf
import torch
from espnet2.bin.s2t_inference_ctc import Speech2TextGreedySearch


if __name__ == "__main__":
    context_len_in_secs = 4   # left and right context when doing buffered inference
    batch_size = 32   # depends on the GPU memory
    s2t = Speech2TextGreedySearch.from_pretrained(
        "pyf98/owsm_ctc_v3.1_1B",
        device='cuda' if torch.cuda.is_available() else 'cpu',
        generate_interctc_outputs=False,
        lang_sym='<eng>',
        task_sym='<asr>',
    )

    speech, rate = sf.read(
        "xxx.wav"
    )

    text = s2t.decode_long_batched_buffered(
        speech,
        batch_size=batch_size,
        context_len_in_secs=context_len_in_secs,
        frames_per_sec=12.5,        # 80ms shift, model-dependent, don't change
    )
    print(text)

wgb14 · 2024-02-28T03:04:28Z

Thanks for your prompt responses! these help a lot.

teinhonglo · 2024-03-03T04:31:15Z

Are there any fine-tuned examples or scripts, or are there plans to release any in the future?
Thanks in advance.

pyf98 · 2024-03-03T15:29:45Z

Hi @teinhonglo , thanks for your question!

The fine-tuning would be similar to the normal setup in ESPnet (if you are familiar with ESPnet).
You can prepare your data in the OWSM format and then remove the timestamps using https://github.com/pyf98/espnet/tree/owsm-ctc/egs2/owsm_v3.1_ctc/s2t1/local

wgb14 added the Question Question label Feb 27, 2024

sw005320 added the Feature request label Feb 27, 2024

pyf98 mentioned this issue Mar 21, 2024

No such parameter e_branchformer_ctc in encoder parameter #5712

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Availability of OWSM-CTC #5683

Availability of OWSM-CTC #5683

wgb14 commented Feb 27, 2024

sw005320 commented Feb 27, 2024

pyf98 commented Feb 27, 2024 •

edited

wgb14 commented Feb 28, 2024

teinhonglo commented Mar 3, 2024

pyf98 commented Mar 3, 2024

Availability of OWSM-CTC #5683

Availability of OWSM-CTC #5683

Comments

wgb14 commented Feb 27, 2024

sw005320 commented Feb 27, 2024

pyf98 commented Feb 27, 2024 • edited

wgb14 commented Feb 28, 2024

teinhonglo commented Mar 3, 2024

pyf98 commented Mar 3, 2024

pyf98 commented Feb 27, 2024 •

edited