-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics api guide #236
Add metrics api guide #236
Conversation
@@ -0,0 +1,61 @@ | |||
.. _api_guide_optimizer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_api_guide_metrics:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
.. _api_guide_optimizer: | ||
|
||
|
||
Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目录名请用中文:评价指标
下面的目录名也请对应改成中文
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
------------------ | ||
|
||
:code:`Precision` 是准确率,用来衡量二分类中召回真值和召回值的比例。:code:`Accuracy` 是正确率,用来衡量二分类中二分类中召回真值和总样本数的比例。需要注意的是,准确率和正确率的定义是不同的,区别可以类比于误差分析中的 :code:`Variance` 和 :code:`Bias` 。:code:`Recall` 是召回率,用来衡量二分类中召回值和总样本数的比例。准确率和召回率的选取相互制约,实际模型中需要进行权衡,可以参考文档 `Precision_and_recall <https://en.wikipedia.org/wiki/Precision_and_recall>`_ 。 | ||
:code:`Auc` 适用于二分类的分类模型评估,用来计算 `ROC曲线的累积面积 <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`_。:code:`Auc` 通过python计算实现,如果关注性能,可以使用 :code:`fluid.layers.auc` 代替。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 准确率:code:
Precision
: XXX - 正确率:code:
Accuracy
:XXX - 召回率:code:
Recall
: XXX - ROC曲线的累积面积率:code:
Auc
:XXX
|
||
在神经网络训练过程中或者训练完成后,需要评估模型的训练效果,评估的方法一般是计算全体预测值和全体真值(label)之间的距离,不同模型会用不同的度量方法,比如分类模型常用 :code:`AUC` 作为分类效果的度量, OCR模型可以用 :code:`EditDistance` 作为识别效果的度量。 | ||
|
||
1.MetricBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 去掉数字标号,且按照使用程度排序
- MetricBase是自定义的时候才需要用的,一般用户用不到,可以考虑不放。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
API Reference 请参考 :ref:`api_fluid_metrics_CompositeMetric` | ||
|
||
3.Precision/Accuracy/Recall/Auc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个应该放在第一个介绍,因为用的很普遍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Metrics | ||
######### | ||
|
||
在神经网络训练过程中或者训练完成后,需要评估模型的训练效果,评估的方法一般是计算全体预测值和全体真值(label)之间的距离,不同模型会用不同的度量方法,比如分类模型常用 :code:`AUC` 作为分类效果的度量, OCR模型可以用 :code:`EditDistance` 作为识别效果的度量。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉评估方法更多和任务
强相关而非模型
,建议调整下模型
和任务
的使用,如不同模型会用不同的度量方法
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
4.ChunkEvaluator | ||
------------------ | ||
|
||
:code:`ChunkEvaluator` 是分组评估度量,接收 :code:`chunk_eval` 接口的输出,累积每一个minibatch的分组统计,最后计算准确率、召回率和F1值。:code:`ChunkEvaluator` 支持IOB, IOE, IOBES and IO四种标注模式。可以参考文档 `Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议chunk
翻译为语块
,加上使用任务场景举例
2.CompositeMetric | ||
------------------ | ||
|
||
:code:`CompositeMetric` 可以组合多个度量指标,只需要在每一个minibatch提供一次预测值和真值,就可以获得多个指标值。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同 @luotao1 ,建议把 Precision/Accuracy/Recall/Auc 放在前面,CompositeMetric
用其他几个举例说明。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
…to add_metrics_api_guide
综合各位老师的建议,改了一版 |
|
||
评价指标 | ||
######### | ||
在神经网络训练过程中或者训练完成后,需要评估模型的训练效果,评估的方法一般是计算全体预测值和全体真值(label)之间的距离,不同类型的任务会用不同的评价方法。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要评估模型的训练效果,逗号改句号
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
序列标注任务评价 | ||
------------------ | ||
序列标注任务中,模型的首要目标是将输入的token分组,称为语块(chunk)。 | ||
语块评估方法 :code:`ChunkEvaluator` ,接收 :code:`chunk_eval` 接口的输出,累积每一个minibatch的语块统计值,最后计算准确率、召回率和F1值。:code:`ChunkEvaluator` 支持IOB, IOE, IOBES和IO四种标注模式。可以参考文档 `Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
38行末尾缺句号
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
序列标注任务评价 | ||
------------------ | ||
序列标注任务中,模型的首要目标是将输入的token分组,称为语块(chunk)。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议调整,除将token分组外还需分类,可以参考 https://github.com/PaddlePaddle/models/tree/develop/legacy/sequence_tagging_for_ner 中的README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯 这个地方写了首要目标,分类可以提一下,只是和后面没有关系,所以没有提
编辑距离 :code:`EditDistance` ,用来衡量两个字符串的相似度。可以参考文档 `Edit_distance <https://en.wikipedia.org/wiki/Edit_distance>`_。 | ||
|
||
API Reference 请参考 :ref:`api_fluid_metrics_EditDistance` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以在最后加上CompositeMetric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompositeMetric 和 MetricBase感觉都不是很常用,所以就没有填写 和上面的按照任务的分类方式也不太统一
|
||
生成任务评价 | ||
------------------ | ||
生成任务会依据输入直接产生输出。对应NLP任务中,则生成新字符串,评估生成字符串和目标字符串之间的距离,可以使用编辑距离。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议例子换为OCR或语音识别(目前models中已有)这种需要保持顺序的任务。
另外个人感觉生成任务评价
这种划分可能不尽合理,翻译这种生成任务其实也可以使用Accuracy这种分类任务的评估方法,大家可以看下是否有更好的方式,如果有更好的方式建议调整。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种评价方法主要用在类似翻译这种生成任务中,倒不是说翻译任务只用一种评价方法,这个观点可以在开始强调下。�按照任务类型划分,也主要是想从用户的角度出发,所以可能不是很严谨。�但guide主要也是引导用户,精确描述还得看api
@@ -34,16 +34,18 @@ | |||
|
|||
序列标注任务评价 | |||
------------------ | |||
序列标注任务中,模型的首要目标是将输入的token分组,称为语块(chunk)。 | |||
语块评估方法 :code:`ChunkEvaluator` ,接收 :code:`chunk_eval` 接口的输出,累积每一个minibatch的语块统计值,最后计算准确率、召回率和F1值。:code:`ChunkEvaluator` 支持IOB, IOE, IOBES和IO四种标注模式。可以参考文档 `Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>`_ | |||
序列标注任务中,模型首先将输入的token分组,称为语块(chunk),其次会对语块中的tocken进行分类。分类的评估可以使用分类任务的评估方法,而tocken分组的评估使用语块评估方法。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tocken-》token笔误
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
序列标注任务评价 | ||
------------------ | ||
序列标注任务中,模型首先将输入的token分组,称为语块(chunk),其次会对语块中的tocken进行分类。分类的评估可以使用分类任务的评估方法,而tocken分组的评估使用语块评估方法。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议修改为序列标注任务通常会同时进行语块分割和分类,避免用户认为是两阶段的歧义。ChunkEvaluator
的评估也同时包括了这两个的评估。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,thanks!
add metrics.rst