Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

热词间有相互干扰 #1727

Closed
kli017 opened this issue May 14, 2024 · 1 comment
Closed

热词间有相互干扰 #1727

kli017 opened this issue May 14, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@kli017
Copy link

kli017 commented May 14, 2024

Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

在runtime环境下使用speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx模型。添加如下热词表时感觉热词直接会有相互干扰。比如
针灸铜人 80
久通 80

测试可能会出现针灸通人、久铜等结果。请问添加热词是单独提高token概率吗。如果是全词匹配的话,按说wfst里影响不会这么大。有无办法解决?

Environment

  • OS (e.g., Linux): Linux
  • FunASR Version (e.g., 1.0.0): FunASR
  • ModelScope Version (e.g., 1.11.0): runtime
  • PyTorch Version (e.g., 2.0.0):
  • How you installed funasr (pip, source):
  • Python version:
  • Docker version: funasr-runtime-sdk-online-cpu-0.1.9
  • Any other relevant information:
@kli017 kli017 added the bug Something isn't working label May 14, 2024
@kli017 kli017 changed the title 热词直接有相互干扰 热词间有相互干扰 May 14, 2024
@R1ckShi
Copy link
Collaborator

R1ckShi commented May 28, 2024

runtime中的热词分两部分,首先是基于clas的nn热词,这个阶段是通过attention进行热词与decoder信息的匹配的
有热词冲突会导致attention机制产生错误的相关性,没有很好的解法
可能的解决方法是拆解长热词或者把短热词补长

@R1ckShi R1ckShi closed this as completed May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants