Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text similarity task for Taskflow #1345

Merged
merged 11 commits into from
Nov 29, 2021
Merged

Add text similarity task for Taskflow #1345

merged 11 commits into from
Nov 29, 2021

Conversation

linjieccc
Copy link
Contributor

@linjieccc linjieccc commented Nov 22, 2021

PR types

New features

PR changes

APIs

Description

1.Add text similarity task for Taskflow

@linjieccc linjieccc marked this pull request as ready for review November 22, 2021 11:50
@@ -174,6 +176,20 @@ senta("作为老的四星酒店,房间依然很整洁,相当不错。机场
>>> [{'text': '作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。', 'label': 'positive', 'score': 0.984320878982544}]
```

### 文本匹配
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文本相似度计算更为直观

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@@ -11,6 +11,7 @@
- [文本纠错](#文本纠错)
- [句法分析](#句法分析)
- [情感分析](#情感分析)
- [文本匹配](#文本匹配)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文本相似度

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


usage = r"""
from paddlenlp import Taskflow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taskname改为text similarity会不会更为表意?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@linjieccc linjieccc changed the title Add text matching task for Taskflow Add text similarity task for Taskflow Nov 22, 2021

similarity = Taskflow("text_similarity")
similarity([["世界上什么东西最小", "世界上什么东西最小?"]])
>>> [{'query': '世界上什么东西最小', 'title': '世界上什么东西最小?', 'similarity': 0.992725}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

输入的key可能采用text1,text2 更加准确。如果用query和title会被倾向于认为是短文本与长文本匹配

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

[{'text1': '世界上什么东西最小', 'text2': '世界上什么东西最小?', 'similarity': 0.992725}]
'''

similarity = Taskflow("text_similarity", batch_size=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size这地方需要手动配置吗?是否可以根据输入的size自动获得呢?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是说batch_size=1的话,不能同时输入两条?还是说这个batch size是作为predictor的关键参数

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size目前是手动配置的,默认值是1,考虑是让用户结合机器本身情况配置

batch_size是predictor的关键参数

Copy link
Member

@ZeyuChen ZeyuChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议整体内部代码都体现为text1和text2,不要外头是text1内部是query

self.input_handles[1].copy_from_cpu(t_segment_ids)
self.predictor.run()
vecs_title = self.output_handle[1].copy_to_cpu()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议整体内部代码都体现为text1和text2。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

similarity([["世界上什么东西最小", "世界上什么东西最小?"]])
>>> [{'text1': '世界上什么东西最小', 'text2': '世界上什么东西最小?', 'similarity': 0.992725}]

similarity = Taskflow("text_similarity", batch_size=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是得告诉开发者,为什么这个batch_size=2有什么用。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其他examples都得同步增强下这里的API参数描述。不然这里会误解,必须要设置batch size=2,才能传入两条样本

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,这里修改了代码示例,新增可配置参数说明

Copy link
Member

@ZeyuChen ZeyuChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZeyuChen ZeyuChen merged commit eec798a into PaddlePaddle:develop Nov 29, 2021
@linjieccc linjieccc deleted the add_simbert branch November 29, 2021 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants