Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

增加基于语义的切分方式 #248

Merged
merged 1 commit into from
May 5, 2023
Merged

增加基于语义的切分方式 #248

merged 1 commit into from
May 5, 2023

Conversation

roydcai
Copy link
Contributor

@roydcai roydcai commented May 5, 2023

添加了达摩院开发的语义文档切分模型damo/nlp_bert_document-segmentation_chinese-base,目前测试下来发现对小说、百科语料(和模型训练数据就是维基百科有关系)比较友好。如果有需要的话可以根据自己场景的数据训练定制化的语义切分模型,可以参考论文:https://arxiv.org/abs/2107.09278

添加了达摩院开发的语义文档切分模型damo/nlp_bert_document-segmentation_chinese-base,目前测试下来发现对小说、百科语料(和模型训练数据就是维基百科有关系)比较友好。如果有需要的话可以根据自己场景的数据训练定制化的语义切分模型,可以参考论文:https://arxiv.org/abs/2107.09278
@imClumsyPanda imClumsyPanda merged commit 23a6b26 into chatchat-space:master May 5, 2023
@imClumsyPanda
Copy link
Collaborator

非常感谢提出的贡献,目前已将这种方式定义为AliTextSplitter类,后续将提供选项让用户选择分句方式。

@cat1222
Copy link

cat1222 commented May 12, 2023

该类在内网环境无法加载阿里云的语义切割模型资源,有什么解决办法吗?

@roydcai
Copy link
Contributor Author

roydcai commented May 12, 2023

可以下载到本地再加载,
git lfs install
git clone http://www.modelscope.cn/damo/nlp_bert_document-segmentation_chinese-base.git

@cat1222
Copy link

cat1222 commented May 16, 2023

可以下载到本地再加载, git lfs install git clone http://www.modelscope.cn/damo/nlp_bert_document-segmentation_chinese-base.git
thx!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants