Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Update README, docs and workflow #13

Merged
merged 3 commits into from
Dec 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ubuntu18.04-py3.6-cibuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: CPU Tests
on: workflow_dispatch

env:
IMAGE: registry.cn-shanghai.aliyuncs.com/pai-dlc/tensorflow-developer:1.15deeprec-dev-cpu-cibuild-py36-ubuntu18.04
IMAGE: registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-developer:deeprec-dev-cpu-cibuild-py36-ubuntu18.04
JOBNAME: deeprec-ci-cpu-${{ github.run_id }}
PODNAME: deeprec-ci-cpu-${{ github.run_id }}-chief-0
BAZEL_CACHE: ${{ secrets.BAZEL_CACHE }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ubuntu18.04-py3.6-cuda11.2-cibuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: GPU Tests
on: workflow_dispatch

env:
IMAGE: registry.cn-shanghai.aliyuncs.com/pai-dlc/tensorflow-developer:1.15deeprec2106-gpu-cibuild-py36-cu110-ubuntu18.04
IMAGE: registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-developer:deeprec-dev-gpu-cibuild-py36-cu110-ubuntu18.04
JOBNAME: deeprec-ci-gpu-${{ github.run_id }}
PODNAME: deeprec-ci-gpu-${{ github.run_id }}-chief-0
BAZEL_CACHE: ${{ secrets.BAZEL_CACHE }}
Expand Down
18 changes: 7 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Sparse model is a type of deep learning model that accounts for a relatively hig
DeepRec has been deeply cultivated since 2016, which supports core businesses such as Taobao Search, recommendation and advertising. It precipitates a list of features on basic frameworks and has excellent performance in sparse models training. Facing a wide variety of external needs and the environment of deep learning framework embracing open source, DeepeRec open source is conducive to establishing standardized interfaces, cultivating user habits, greatly reducing the cost of external customers working on cloud and establishing the brand value.

### **Key Features**
DeepRec has super large-scale distributed training capability, supporting model training of trillion samples and 100 billion Embedding Processing. For sparse model scenarios, in-depth performance optimization has ben conducted across CPU and GPU platform. It contains 3 kinds of features to improve usability and performance for super-scale scenarios.
DeepRec has super large-scale distributed training capability, supporting model training of trillion samples and 100 billion Embedding Processing. For sparse model scenarios, in-depth performance optimization has been conducted across CPU and GPU platform. It contains 3 kinds of features to improve usability and performance for super-scale scenarios.
#### **Sparse Functions**
- Embedding Variable.
- Dynamic Dimension Embedding Variable.
Expand Down Expand Up @@ -41,14 +41,14 @@ DeepRec has super large-scale distributed training capability, supporting model
CPU Platform

```
registry.cn-shanghai.aliyuncs.com/pai-dlc/tensorflow-developer:1.15deeprec2106-cpu-py36-ubuntu18.04
registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-developer:deeprec-dev-cpu-py36-ubuntu18.04
```

GPU Platform


```
registry.cn-shanghai.aliyuncs.com/pai-dlc/tensorflow-developer:1.15deeprec2106-gpu-py36-cu110-ubuntu18.04
registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-developer:deeprec-dev-gpu-py36-cu110-ubuntu18.04
```
### **How to Build**

Expand Down Expand Up @@ -94,17 +94,13 @@ registry.cn-shanghai.aliyuncs.com/pai-dlc/tensorflow-training:deeprec-nightly-gp
registry.cn-shanghai.aliyuncs.com/pai-dlc/tensorflow-training:deeprec-nightly-cpu-py36-ubuntu18.04
```

### **Jave Compilation**
```
$ ./configure
$ bazel build --config opt //tensorflow/java:tensorflow //tensorflow/java:libtensorflow_jni
$ javac -cp bazel-bin/tensorflow/java/libtensorflow.jar ...
$ java -cp bazel-bin/tensorflow/java/libtensorflow.jar -Djava.library.path=bazel-bin/tensorflow/java ...

```
***
## **User Document (Chinese)**

[https://deeprec.readthedocs.io/en/latest/](https://deeprec.readthedocs.io/en/latest/)

***
## **License**

[Apache License 2.0](LICENSE)

1 change: 0 additions & 1 deletion docs/Dynamic-dimension-Embedding-Variable.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,6 @@ def embedding_lookup(
```
## 使用示例
### 动态维度调整策略
> 以下内容来自于阿里巴巴首页猜你喜欢团队同学的分享

对于每一个特征有两个统计量,分别是累积特征频次freq_acc以及特征在当前时段的出现速度freq_current_speed,freq_acc和freq_current_speed都会被初始化为0,freq_acc平时不更新,会在特定的step根据一定的规则进行更新,而freq_current_speed则会随着特征被访问而更新,特征每被访问一次freq_current_speed就会+1,然后会在特定的step被置为0。
目前使用的更新freq_acc的规则如下:
Expand Down
56 changes: 46 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,37 +3,73 @@
</h1>

# Introduction
DeepRec is a recommendation engine based on [TensorFlow 1.15](https://www.tensorflow.org/), [Intel-TensorFlow](https://github.com/Intel-tensorflow/tensorflow) and [NVIDIA-TensorFlow](https://github.com/NVIDIA/tensorflow).
稀疏模型,是指在模型结构中离散特征计算逻辑占比较高的一类深度学习模型的统称。离散特征通常表现为id、tag、文字、词组等算法不能直接处理的非数值化特征,其广泛应用于搜索、广告、推荐等高价值业务中。当下主流开源深度学习框架,对稀疏模型的支持不足。在稀疏功能的支持、训练性能存在着问题,制约了稀疏模型的探索和发展。


Sparse model is a type of deep learning model that accounts for a relatively high proportion of discrete feature calculation logic in the model structure. Discrete features are usually expressed as non-numeric features that cannot be directly processed by algorithms such as id, tag, text, and phrases. They are widely used in high-value businesses such as search, advertising, and recommendation.


DeepRec has been deeply cultivated since 2016, which supports core businesses such as Taobao Search, recommendation and advertising. It precipitates a list of features on basic frameworks and has excellent performance in sparse models training. Facing a wide variety of external needs and the environment of deep learning framework embracing open source, DeepeRec open source is conducive to establishing standardized interfaces, cultivating user habits, greatly reducing the cost of external customers working on cloud and establishing the brand value.


DeepRec has super large-scale distributed training capability, supporting model training of trillion samples and 100 billion Embedding Processing. For sparse model scenarios, in-depth performance optimization has ben conducted across CPU and GPU platform. It contains 3 kinds of features to improve usability and performance for super-scale scenarios.
DeepRec(PAI-TF) 支持了淘宝搜索、猜你喜欢、定向、直通车等核心业务,支撑着千亿特征、万亿样本超大规模的稀疏训练。积累了核心的稀疏场景的功能及性能优化。针对稀疏模型在分布式、图优化、算子、Runtime等方面进行了深度的性能优化,同时提供了稀疏场景下特有的动态弹性特征,动态弹性维度,多Hash Embedding,自适应EmbeddingVariable、增量模型导出及加载等一系列功能。

# Contents

```{toctree}
:maxdepth: 2
:caption: 稀疏功能

Embedding-Variable
Feature-Eviction
Dynamic-dimension-Embedding-Variable
Adaptive-Embedding
Multi-Hash-Variable
```

```{toctree}
:maxdepth: 2
:caption: 分布式训练

GRPC++
StarServer
```

```{toctree}
:maxdepth: 2
:caption: 图优化

Auto-Micro-Batch
Fused-Embedding
Smart-Stage
```

```{toctree}
:maxdepth: 2
:caption: Runtime优化

TensorPoolAllocator
WorkQueue
```

```{toctree}
:maxdepth: 2
:caption: 模型导出

Incremental-Checkpoint
```

```{toctree}
:maxdepth: 2
:caption: 优化器

AdamAsync-Optimizer
AdagradDecay-Optimizer
```

```{toctree}
:maxdepth: 2
:caption: 算子及硬件加速

NVIDIA-TF32
oneDNN
```

```{toctree}
:maxdepth: 2
:caption: 样本读取及Dataset

WorkQueue
```