Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSSM #141

Merged
merged 13 commits into from
Jul 13, 2017
Merged

DSSM #141

merged 13 commits into from
Jul 13, 2017

Conversation

Superjomn
Copy link
Contributor

DSSM 模型实现

支持 CLASSIFICATION/REGRESSION/RANK 3种损失函数 +
FC/RNN/CNN 三种模型结构的组合

@Superjomn Superjomn requested a review from lcy-seso July 2, 2017 10:22
dssm/README.md Outdated
@@ -0,0 +1,397 @@
# Deep Structured Semantic Models (DSSM)
两个单位间语义距离的权衡是一种非常常规的需求,本文将会演示如何使用 DSSM 模型建模两个字符串间的关系,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • “两个单位间语义距离的权衡是一种非常常规的需求” 这句话的含义不是特别精确,我没有从这个描述中精确地理解到这篇例子的模型要解决什么问题。
  • “使用 DSSM 模型建模两个字符串间的关系” --> 什么关系?语义关系?
  • “单位”这里具体都可以指什么呢?有些意义不明确。
  • ”本文将会演示“ --> "本例演示",和其它篇章保持一致。

dssm/README.md Outdated
具体模型实现支持通用的数据格式,用户只需要替换数据就可以在自己的任务上训练和预测。

## 背景介绍
DSSM \[[1](##参考文档)\]是微软研究院13年提出来的比较经典的语义模型,用于学习两个单位之间的语义距离,适用于如下场景:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 比较经典 --> 这样口语化的描述尽量避免。技术文章里,经典就是经典,不经典就是不经打,不存在“比较经典”。
  • “单位” 有没有其他说法?或者其它词汇,表意不具体,不明确。

dssm/README.md Outdated
@@ -0,0 +1,397 @@
# Deep Structured Semantic Models (DSSM)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 深度结构化语义模型 (Deep Structured Semantic Models, DSSM)
  • 在存在中文的情况下,尽量给一个中文的解释

dssm/README.md Outdated

1. CTR预估模型,衡量用户搜索词(Query)与候选网页集合(Documents)之间的相关联程度
2. 文本相关性,衡量两个字符串间的语义相关程度
3. 自动推荐,衡量User与被推荐的Item之间的关联程度
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 ~ 3 每句之后都加上句号吧。

dssm/README.md Outdated
2. 文本相关性,衡量两个字符串间的语义相关程度
3. 自动推荐,衡量User与被推荐的Item之间的关联程度

DSSM 如今已经发展成了一个框架,可以很自然地建模两个单位之间的距离关系,比如对于相关性,可以用余弦相似度(COS)来刻画,模型具体训练时,可以用分类或者pairwise rank的方式训练。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如今,DSSM 已经发展成一个框架,用于很建模两个单位之间的距离关系。例如:对于相关性问题,可以用余弦相似度(cosine)来刻画,[还有没有其它例子,一个例子用来说明 DSSM 发展为一个框架略显单薄行]。模型的训练可以使用分类或者pairwise rank的方式。

@@ -0,0 +1,20 @@
新手 汽车 驾驶 驾校 培训 苹果 6s 1
苹果 六 袋 苹果 6s 新手 汽车 驾驶 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 示例数据,也将训练和测试数据放置为不同数据。
  2. 提供不同的数据,删掉重复。

@@ -0,0 +1,20 @@
新手 汽车 驾驶 驾校 培训 苹果 6s 1
苹果 六 袋 苹果 6s 新手 汽车 驾驶 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 示例数据,也将训练和测试数据放置为不同数据。
  2. 提供不同的数据,删掉重复。

logger.info("create fc layer [%s] which dimention is %d" %
(name, dim))
fc = paddle.layer.fc(
name=name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果不是显示地引用名字,去掉指定名字。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ModelType.CLASSIFICATION)

for rcd in dataset.train():
print rcd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删掉弃用的注释。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dong

dssm/utils.py Outdated

UNK = 0

logger = logging.getLogger("logger")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接 get “paddle” 这个logger,可以不需要再重新定义一个。

Copy link
Collaborator

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and we will keep on refining this example.

@lcy-seso lcy-seso merged commit 0cd7b3a into PaddlePaddle:develop Jul 13, 2017
@Superjomn Superjomn mentioned this pull request Jul 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants