Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor ctr model #138

Merged
merged 7 commits into from
Jul 17, 2017
Merged

refactor ctr model #138

merged 7 commits into from
Jul 17, 2017

Conversation

Superjomn
Copy link
Contributor

修改如下:

  1. 模型和示例数据的格式解耦,现在用户只需要准备对应格式的数据,不需要修改代码就可以直接训练和预测
  2. 添加预测脚本 infer.py
  3. 支持 classification, regression 两种任务类型,可通过命令行参数控制

@Superjomn Superjomn requested a review from lcy-seso June 29, 2017 02:07
ctr/README.md Outdated
├── train.py # 训练脚本
└── utils.py # helper functions
```

## 背景介绍

CTR(Click-Through Rate,点击率预估)\[[1](https://en.wikipedia.org/wiki/Click-through_rate)\] 是用来表示用户点击一个特定链接的概率,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 第一句话:是用来表示用户点击一个特定链接的概率 ,通常被用来衡量一个在线广告系统的有效性。--> 是对用户点击一个特定链接的概率做出预测,是广告投放过程中的一个重要环节。精准的点击率预估对在线广告系统收益最大化具有重要意义。
  2. 11 行,from @llxxxll "召回"这个词对基础用户比较陌生,解释或者再用其他方式描述一下。
  3. 这一篇分段过于细碎,第23,24 行全部合在第一段中。
  4. 第24行“系统大体上会执行下列步骤来展示广告” --> 粗略来讲,系统会执行下列步骤展示广告:
  5. 31 行去掉 “很”,很重要 --> 重要。
  6. 53 ~ 62 行合并为一段。

ctr/README.md Outdated
具体的特征处理方法参看 [data process](./dataset.md)。

本教程中演示模型的输入格式如下:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 69 行click 是指点击率吗?在69行时还没有介绍数据,这里还无法和数据中的 click 字段关联上,说明更清楚一些吧。
  2. 79 和 81 合为一段。
  3. 81 行之后,可否用文字再进一步描述解释一下格式。

ctr/README.md Outdated
@@ -61,8 +76,40 @@ LR 对于 DNN 模型的优势是对大规模稀疏特征的容纳能力,包括

我们使用 Kaggle 上 `Click-through rate prediction` 任务的数据集\[[2](https://www.kaggle.com/c/avazu-ctr-prediction/data)\] 来演示模型。

具体的特征处理方法参看 [data process](./dataset.md)
具体的特征处理方法参看 [data process](./dataset.md)。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

77 行 我们使用 Kaggle 上 Click-through rate prediction 任务的数据集来演示模型。 --> 我们使用 Kaggle 上 Click-through rate prediction 任务的数据集来运行本例中的模型。

ctr/README.md Outdated
23 231 \t 1230:0.12 13421:0.9 \t 1
```

演示数据集\[[2](#参考文档)\] 可以使用 `avazu_data_processor.py` 脚本处理,具体使用方法参考如下说明:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本例目录下的avazu_data_processor.py 脚本可以对下载的原始数据进行处理,具体使用方法参考如下说明:

ctr/README.md Outdated
├── network_conf.py # 模型网络配置
├── reader.py # data provider
├── train.py # 训练脚本
└── utils.py # helper functions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 这里没有 avazu_data_processor.py 这个文件,是刻意的吗?这个文件也挺重要的。
  • images 目录是可以简单地省略。

self.output = layer.fc(
input=merge_layer,
size=1,
name='output',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,去掉name

@@ -0,0 +1,112 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有些脚本加了 shebang,保持一致,或者都删掉,或者都加上。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ctr/utils.py Outdated
import logging

logging.basicConfig()
logger = logging.getLogger("logger")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 可以直接 logging.getLogger("paddle") 获取 config_parser.py 中的logger。

ctr/train.py Outdated
# n_records_as_test=args.test_set_size,
# fields=reader.fields,
# feature_dims=reader.feature_dims)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是不需要的注释,就删掉吧。

for key in id_features) + 1
# logger.warning("dump dataset's meta info to %s" % meta_out_path)
# cPickle.dump([feature_dims, fields], open(meta_out_path, 'wb'))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要的注释就删掉。

@Superjomn
Copy link
Contributor Author

运行训练和测试 有问题,需要check

Copy link
Collaborator

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ctr/README.md Outdated

1. 召回满足 query 的广告集合
1. 获取满足 query 的广告集合
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“满足 query ” 这一句的意义不明白,我不是特别理解。
能否略微再增加一些描述性词汇。

@Superjomn Superjomn merged commit e88101b into PaddlePaddle:develop Jul 17, 2017
@Superjomn Superjomn deleted the ctr2 branch July 17, 2017 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants