Skip to content

效果非常差 #51

@MonolithFoundation

Description

@MonolithFoundation

SenseVoice这个漏字太明显了。

output3.mp3.zip

[{'key': 'output3',
'text': '<|zh|><|HAPPY|><|Applause|><|woitn|>我跳脱口秀大会舞吗李诞互相伤害吧我还挺怀念在那个浪姐的那段时间比我是都没怎么赢过好的舞输什坏事位选也是啊尽力去拼输了没关系至少跟我没关系大迎我也欢迎一下我们的领校各欢迎我们的返场领校园娜姐娜怎么想呢回来了跟我们玩我就是觉得像回家一不是上一场就来了来了之后啊回去宿得人快幸福天生默的人跟大家介绍一下赛舞生存但获得四的十六组演员已经分为四组将展开组内对决每组前两名直接晋级进入下一赛段的主题赛名遗憾淘汰今天投票呢满票票观人票票我是一个笑点特别低的人但是我希望我真看不出来我感觉你这辈子都不怎么笑应该是那种上去很冷酷的人特冷笑点特别来的时候笑去了下面话不多但是特别爱笑太好了今天要上场的十六位朋友已经都在被战来让我们看一下他们第一组王十孟川小欢迎欢迎下一组曹鹏医生姜子浩小徐智再组蛋李大爷 '
'rock庞博'}]

这个漏字太多了,仔细听一下,而且有很多重复

def __init__(self) -> None:
        model_dir = "iic/SenseVoiceCTC"
        # self.model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)
        model_dir = "iic/SenseVoiceSmall"
        # self.model, self.kwargs = SenseVoiceSmall.from_pretrained(model=model_dir)
        self.model = AutoModel(
            model=model_dir,
            vad_model="fsmn-vad",
            vad_kwargs={"max_single_segment_time": 60000},
            punc_model="ct-punc",
            spk_model="cam++",
        )

    def asr(self, wav_path):
        # result = self.model(wav_path)
        # return result
        result = self.model.inference(
            # data_in=wav_path,
            input=wav_path,
            language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
            use_itn=False,
            # **self.kwargs,
        )
        return result

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions