Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics api guide #236

Merged
merged 9 commits into from
Oct 30, 2018
Merged

Conversation

xuezhong
Copy link
Contributor

add metrics.rst

@xuezhong
Copy link
Contributor Author

image

@shanyi15 shanyi15 added the API Guide docs related to API Guide label Oct 26, 2018
@@ -0,0 +1,61 @@
.. _api_guide_optimizer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_api_guide_metrics:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

.. _api_guide_optimizer:


Metrics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目录名请用中文:评价指标
下面的目录名也请对应改成中文

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

------------------

:code:`Precision` 是准确率,用来衡量二分类中召回真值和召回值的比例。:code:`Accuracy` 是正确率,用来衡量二分类中二分类中召回真值和总样本数的比例。需要注意的是,准确率和正确率的定义是不同的,区别可以类比于误差分析中的 :code:`Variance` 和 :code:`Bias` 。:code:`Recall` 是召回率,用来衡量二分类中召回值和总样本数的比例。准确率和召回率的选取相互制约,实际模型中需要进行权衡,可以参考文档 `Precision_and_recall <https://en.wikipedia.org/wiki/Precision_and_recall>`_ 。
:code:`Auc` 适用于二分类的分类模型评估,用来计算 `ROC曲线的累积面积 <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`_。:code:`Auc` 通过python计算实现,如果关注性能,可以使用 :code:`fluid.layers.auc` 代替。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 准确率:code:Precision: XXX
  • 正确率:code:Accuracy:XXX
  • 召回率:code:Recall: XXX
  • ROC曲线的累积面积率:code:Auc:XXX

doc/fluid/api/api_guides/low_level/metrics.rst Outdated Show resolved Hide resolved

在神经网络训练过程中或者训练完成后,需要评估模型的训练效果,评估的方法一般是计算全体预测值和全体真值(label)之间的距离,不同模型会用不同的度量方法,比如分类模型常用 :code:`AUC` 作为分类效果的度量, OCR模型可以用 :code:`EditDistance` 作为识别效果的度量。

1.MetricBase
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 去掉数字标号,且按照使用程度排序
  • MetricBase是自定义的时候才需要用的,一般用户用不到,可以考虑不放。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


API Reference 请参考 :ref:`api_fluid_metrics_CompositeMetric`

3.Precision/Accuracy/Recall/Auc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个应该放在第一个介绍,因为用的很普遍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@shanyi15 shanyi15 requested review from guoshengCS and shanyi15 and removed request for shanyi15 October 26, 2018 08:09
Metrics
#########

在神经网络训练过程中或者训练完成后,需要评估模型的训练效果,评估的方法一般是计算全体预测值和全体真值(label)之间的距离,不同模型会用不同的度量方法,比如分类模型常用 :code:`AUC` 作为分类效果的度量, OCR模型可以用 :code:`EditDistance` 作为识别效果的度量。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉评估方法更多和任务强相关而非模型,建议调整下模型任务的使用,如不同模型会用不同的度量方法

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

4.ChunkEvaluator
------------------

:code:`ChunkEvaluator` 是分组评估度量,接收 :code:`chunk_eval` 接口的输出,累积每一个minibatch的分组统计,最后计算准确率、召回率和F1值。:code:`ChunkEvaluator` 支持IOB, IOE, IOBES and IO四种标注模式。可以参考文档 `Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>`_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议chunk 翻译为语块,加上使用任务场景举例

2.CompositeMetric
------------------

:code:`CompositeMetric` 可以组合多个度量指标,只需要在每一个minibatch提供一次预测值和真值,就可以获得多个指标值。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 ,建议把 Precision/Accuracy/Recall/Auc 放在前面,CompositeMetric用其他几个举例说明。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@xuezhong
Copy link
Contributor Author

综合各位老师的建议,改了一版


评价指标
#########
在神经网络训练过程中或者训练完成后,需要评估模型的训练效果,评估的方法一般是计算全体预测值和全体真值(label)之间的距离,不同类型的任务会用不同的评价方法。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要评估模型的训练效果,逗号改句号

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

序列标注任务评价
------------------
序列标注任务中,模型的首要目标是将输入的token分组,称为语块(chunk)。
语块评估方法 :code:`ChunkEvaluator` ,接收 :code:`chunk_eval` 接口的输出,累积每一个minibatch的语块统计值,最后计算准确率、召回率和F1值。:code:`ChunkEvaluator` 支持IOB, IOE, IOBES和IO四种标注模式。可以参考文档 `Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>`_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

38行末尾缺句号

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


序列标注任务评价
------------------
序列标注任务中,模型的首要目标是将输入的token分组,称为语块(chunk)。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议调整,除将token分组外还需分类,可以参考 https://github.com/PaddlePaddle/models/tree/develop/legacy/sequence_tagging_for_ner 中的README

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯 这个地方写了首要目标,分类可以提一下,只是和后面没有关系,所以没有提

编辑距离 :code:`EditDistance` ,用来衡量两个字符串的相似度。可以参考文档 `Edit_distance <https://en.wikipedia.org/wiki/Edit_distance>`_。

API Reference 请参考 :ref:`api_fluid_metrics_EditDistance`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以在最后加上CompositeMetric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompositeMetric 和 MetricBase感觉都不是很常用,所以就没有填写 和上面的按照任务的分类方式也不太统一


生成任务评价
------------------
生成任务会依据输入直接产生输出。对应NLP任务中,则生成新字符串,评估生成字符串和目标字符串之间的距离,可以使用编辑距离。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议例子换为OCR或语音识别(目前models中已有)这种需要保持顺序的任务。
另外个人感觉生成任务评价这种划分可能不尽合理,翻译这种生成任务其实也可以使用Accuracy这种分类任务的评估方法,大家可以看下是否有更好的方式,如果有更好的方式建议调整。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种评价方法主要用在类似翻译这种生成任务中,倒不是说翻译任务只用一种评价方法,这个观点可以在开始强调下。�按照任务类型划分,也主要是想从用户的角度出发,所以可能不是很严谨。�但guide主要也是引导用户,精确描述还得看api

@@ -34,16 +34,18 @@

序列标注任务评价
------------------
序列标注任务中,模型的首要目标是将输入的token分组,称为语块(chunk)。
语块评估方法 :code:`ChunkEvaluator` ,接收 :code:`chunk_eval` 接口的输出,累积每一个minibatch的语块统计值,最后计算准确率、召回率和F1值。:code:`ChunkEvaluator` 支持IOB, IOE, IOBES和IO四种标注模式。可以参考文档 `Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>`_
序列标注任务中,模型首先将输入的token分组,称为语块(chunk),其次会对语块中的tocken进行分类。分类的评估可以使用分类任务的评估方法,而tocken分组的评估使用语块评估方法。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tocken-》token笔误

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


序列标注任务评价
------------------
序列标注任务中,模型首先将输入的token分组,称为语块(chunk),其次会对语块中的tocken进行分类。分类的评估可以使用分类任务的评估方法,而tocken分组的评估使用语块评估方法。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议修改为序列标注任务通常会同时进行语块分割和分类,避免用户认为是两阶段的歧义。ChunkEvaluator 的评估也同时包括了这两个的评估。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@luotao1
Copy link
Collaborator

luotao1 commented Oct 30, 2018

LGTM

Copy link
Contributor

@shanyi15 shanyi15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,thanks!

@shanyi15 shanyi15 merged commit 211ebcd into PaddlePaddle:develop Oct 30, 2018
@shanyi15 shanyi15 added this to Done in API Guide Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Guide docs related to API Guide
Projects
No open projects
API Guide
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants