PaddlePaddle · Yancey1989 · Jul 23, 2017 · Jun 26, 2017 · Jun 26, 2017 · Jun 26, 2017
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,4 @@
 .Python
 *.crt
+.cache
+vendor
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,7 @@
+-   repo: https://github.com/dnephin/pre-commit-golang
+    sha: e4693a4c282b4fc878eda172a929f7a6508e7d16
+    hooks:
+      -   id: go-fmt
+          files: \.go$
+      -   id: go-lint
+          files: \.go$
diff --git a/.tools/check_style.sh b/.tools/check_style.sh
@@ -0,0 +1,20 @@
+#!/bin/bash
+function abort(){
+    echo "Your change doesn't follow PaddleCloud's code style." 1>&2
+    echo "Please use pre-commit to reformat your code and git push again." 1>&2
+    exit 1
+}
+
+trap 'abort' 0
+set -e
+
+cd $TRAVIS_BUILD_DIR
+export PATH=/usr/bin:$PATH
+pre-commit install
+pre-commit  --version
+
+if ! pre-commit run -a ; then
+  git diff  --exit-code
+fi
+
+trap : 0
diff --git a/.travis.yml b/.travis.yml
@@ -2,7 +2,17 @@ matrix:
   include:
     - language: go
       go: 1.8.x
-      script: bash .tools/gen_config.sh && cd go && go test ./...
+      sudo: required
+      before_script:
+        - eval "$(GIMME_GO_VERSION=1.8.3 gimme)"
+        - go get -u github.com/golang/lint/golint
+        - curl https://glide.sh/get | bash
+        - sudo pip install pre-commit
+      script: 
+        - | 
+          bash .tools/check_style.sh
+          RESULT=$?; if [ $RESULT -eq 0 ]; then true; else false; fi;
+        - bash .tools/gen_config.sh && cd go && glide install && go test $(glide novendor)
     - language: python
       python: 2.7
       sudo: required

diff --git a/README.md b/README.md
@@ -1,82 +1,104 @@
 # PaddlePaddle Cloud
 
-## Using Command-line To Submit Cloud Training Jobs
+PaddlePaddle Cloud is a Distributed Deep-Learning Cloud Platform for both cloud
+providers and enterprises.
+
+PaddlePaddle Cloud use [Kubernetes](https://kubernetes.io) as it's backend job
+dispatching and cluster resource management center. And use [PaddlePaddle](https://github.com/PaddlePaddle/Paddle.git)
+as the deep-learning frame work. Users can use web pages or command-line tools
+to submit their deep-learning training jobs remotely to make use of power of
+large scale GPU clusters.
 
-[English tutorials](./doc/usage_en.md)
+## Using Command-line To Submit Cloud Training Jobs
 
 [中文手册](./doc/usage_cn.md)
 
+English tutorials(comming soon...)
+
 ## Deploy PaddlePaddle Cloud
 
 ### Pre-Requirements
-- PaddlePaddle Cloud needs python to support `OPENSSL 1.2`. To check it out, simply run:
-    ```python
-       >>> import ssl
-       >>> ssl.OPENSSL_VERSION
-       'OpenSSL 1.0.2k  26 Jan 2017'
-    ```
-- Make sure you have `Python > 2.7.10` installed.
+- PaddlePaddle Cloud use kubernetes as it's backend core, deploy kubernetes cluster
+  using [Sextant](https://github.com/k8sp/sextant) or any tool you like.
+
 
 ### Run on kubernetes
 - Build Paddle Cloud Docker Image
+
   ```bash
   # build docker image
   git clone https://github.com/PaddlePaddle/cloud.git
   cd cloud/paddlecloud
   docker build -t [your_docker_registry]/pcloud .
+  # push to registry so that we can submit paddlecloud to kubernetes
   docker push [your_docker_registry]/pcloud
   ```
-- We use [volume](https://kubernetes.io/docs/concepts/storage/volumes/) to mount MySQL data and cert files, such as CephFS, GlusterFS and etc..., the follow is a example using [hostpath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath):
+
+- We use [volume](https://kubernetes.io/docs/concepts/storage/volumes/) to mount MySQL data,
+  cert files and settings, in `k8s/` folder we have some samples for how to mount
+  stand-alone files and settings using [hostpath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath). Here's
+  a good tutorial of creating kubernetes certs: https://coreos.com/kubernetes/docs/latest/getting-started.html
 
   - create data folder on a Kubernetes node, such as:
   ```bash
   mkdir -p /home/pcloud/data/mysql
   mkdir -p /home/pcloud/data/certs
   ```
-  - Copy Kubernetes CA files (ca.pem, ca-key.pem, ca.srl) to `pcloud_data/certs` folder
-  - Copy Kubernetes admin user key (admin.pem, admin-key.pem) to `pcloud_data/certs` folder
-  - Copy CephFS Key file(admin.secret) to `pcloud_data/certs` folder
-  - Copy `/paddlecloud/settings.py` file to `pcloud_data` folder
+  - Copy Kubernetes CA files (ca.pem, ca-key.pem, ca.srl) to `/home/pcloud/data/certs` folder
+  - Copy Kubernetes admin user key (admin.pem, admin-key.pem) to `/home/pcloud/data/certs` folder
+  - Optianal: copy CephFS Key file(admin.secret) to `/home/pcloud/data/certs` folder
+  - Copy `paddlecloud/settings.py` file to `/home/pcloud/data` folder
 
 - Configure `cloud_deployment.yaml`
   - `spec.template.spec.containers[0].volumes` change the `hostPath` which match your data folder.
   - `spec.template.spec.nodeSelector.`, edit the value `kubernetes.io/hostname` to host which data folder on.You can use `kubectl get nodes` to list all the Kubernetes nodes.
 - Configure `settings.py`
   - Add your domain name to `ALLOWED_HOSTS`.
-  - Configure `DATACENTERS` to your backend storage.
-- Configure `cloud_ingress.yaml`
-  - `spec.rules[0].host` specify your domain name
+  - Configure `DATACENTERS` to your backend storage, supports CephFS and HostPath currently.
+    You can use HostPath mode to make use of shared file-systems like "NFS".
+- Configure `cloud_ingress.yaml` is your kubernetes cluster is using [ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/)
+  to proxy HTTP traffics, or you can configure `cloud_service.yaml` to use [NodePort](https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport)
+  - if using ingress, configure `spec.rules[0].host` to your domain name
 - Deploy cloud on Kubernetes
   - `kubectl create -f k8s/cloud_deployment.yaml`
   - `kubectl create -f k8s/cloud_service.yaml`
-  - `kubectl create -f k8s/cloud_ingress.yaml`
+  - `kubectl create -f k8s/cloud_ingress.yaml`(optianal)
 
 
-To test or visit the website, find out the kubernetes [ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) IP addresses, and bind it to your `/etc/hosts` file:
-```
-# your ingress IP address
-192.168.1.100    cloud.paddlepaddle.org
-```
+To test or visit the website, find out the kubernetes ingress IP
+addresses, or the NodePort.
 
-Then open your browser and visit http://cloud.paddlepaddle.org.
+Then open your browser and visit http://<ingress-ip-address>, or
+http://<any-node-ip-address>:<NodePort>
 
 - Prepare public dataset
 
   You can create a Kubernetes Job for preparing the public dataset and cluster trainer files.
   ```bash
   kubectl create -f k8s/prepare_dataset.yaml
   ```
-
-### Run locally
-Make sure you are using a virtual environment of some sort (e.g. `virtualenv` or
+
+### Run locally without docker
+
+- You still need a kubernetes cluster when try running locally.
+- Make sure you have `Python > 2.7.10` installed.
+- Python needs to support `OPENSSL 1.2`. To check it out, simply run:
+    ```python
+       >>> import ssl
+       >>> ssl.OPENSSL_VERSION
+       'OpenSSL 1.0.2k  26 Jan 2017'
+    ```
+- Make sure you are using a virtual environment of some sort (e.g. `virtualenv` or
 `pyenv`).
+
 ```
 virtualenv paddlecloudenv
 # enable the virtualenv
 source paddlecloudenv/bin/activate
 ```
 
 To run for the first time, you need to:
+
 ```
 npm install
 pip install -r requirements.txt
@@ -102,4 +124,3 @@ EMAIL_BACKEND = 'django_sendmail_backend.backends.EmailBackend'
 You may need to use `hostNetwork` for your pod when using mail command.
 
 Or you can use django smtp bindings just refer to https://docs.djangoproject.com/en/1.11/topics/email/
-
diff --git a/demo/fit_a_line/train.py b/demo/fit_a_line/train.py
@@ -1,5 +1,8 @@
 import paddle.v2 as paddle
 import pcloud.dataset.uci_housing as uci_housing
+import os
+import gzip
+trainer_id = os.getenv("PADDLE_INIT_TRAINER_ID")
 
 def main():
     # init
@@ -34,7 +37,10 @@ def event_handler(event):
                 reader=paddle.batch(uci_housing.test(), batch_size=2),
                 feeding=feeding)
             print "Test %d, Cost %f" % (event.pass_id, result.cost)
-
+            if trainer_id == "0":
+                with gzip.open("fit-a-line_pass_%05d.tar.gz" % event.pass_id,
+                               "w") as f:
+                    parameters.to_tar(f)
     # training
     trainer.train(
         reader=paddle.batch(

diff --git a/doc/tutorial_cn.md b/doc/tutorial_cn.md
@@ -0,0 +1,150 @@
+# 提交第一个训练任务
+
+---
+
+## 下载并配置paddlecloud
+
+`paddlecloud`是提交PaddlePaddleCloud分布式训练任务的命令行工具。
+
+步骤1: 访问链接 https://github.com/PaddlePaddle/cloud/releases 根据操作系统下载最新的`paddlecloud`
+二进制客户端，并把`paddlecloud`拷贝到环境变量$PATH中的路径下，比如：`/usr/local/bin`，然后增加可执行权限：
+`chmod +x /usr/local/bin/paddlecloud`
+
+|操作系统|二进制版本|
+-- | --
+Mac OSX| paddlecloud.dawin
+Windows| paddlecloud.exe
+Linux | paddlecloud.x86_64
+
+步骤2: 创建`~/.paddle/config`文件(windows系统创建当前用户目录下的`.paddle\config`文件)，并写入下面内容，
+
+```yaml
+datacenters:
+- name: dlnel
+  username: [your user name]
+  password: [secret]
+  endpoint: http://cloud.dlnel.com
+current-datacenter: dlnel
+```
+
+配置文件用于指定使用的PaddlePaddleCloud服务器集群的接入地址，并需要配置用户的登录信息：
+- name: 自定义的datacenter名称，可以是任意字符串
+- username: PaddlePaddleCloud的用户名，账号在未开放注册前需要联系管理员分配，通常用户名为邮箱地址
+- password: 账号对应的密码
+- endpoint: PaddlePaddleCloud集群API地址，可以从集群管理员处获得
+- current-datacenter: 标明使用哪个datacenter作为当前操作的datacenter
+
+配置文件创建完成后，执行`paddlecloud`会显示该客户端的帮助信息：
+
+```
+Usage: paddlecloud <flags> <subcommand> <subcommand args>
+
+Subcommands:
+	commands         list all command names
+	delete           Delete the specify resource.
+	file             Simple file operations.
+	get              Print resources
+	help             describe subcommands and their syntax
+	kill             Stop the job. -rm will remove the job from history.
+	logs             Print logs of the job.
+	registry         Add registry secret on paddlecloud.
+	submit           Submit job to PaddlePaddle Cloud.
+
+Subcommands for PFS:
+	cp               uoload or download files
+	ls               List files on PaddlePaddle Cloud
+	mkdir            mkdir directoies on PaddlePaddle Cloud
+	rm               rm files on PaddlePaddle Cloud
+
+
+Use "paddlecloud flags" for a list of top-level flags
+```
+
+## 下载demo代码并提交运行
+
+完成上面的配置之后，您可以马上提交一个示例的集群训练任务。我们准备了一些样例代码帮助理解集群训练
+任务的提交方法，您可以使用下面的命令获取样例代码并提交任务：
+
+这些示例都是基于[paddle book](https://github.com/PaddlePaddle/book)编写的，对应的每个示例
+的解释可以参考paddle book。
+
+```bash
+mkdir fit_a_line
+cd fit_a_line
+wget https://raw.githubusercontent.com/PaddlePaddle/cloud/develop/demo/fit_a_line/train.py
+cd ..
+paddlecloud submit -jobname fit-a-line -cpu 1 -gpu 1 -parallelism 1 -entry "python train.py" fit_a_line/
+```
+
+可以看到在提交任务的时候，我们指定了任务的名称`-jobname fit-a-line`、使用的CPU资源`-cpu 1`、
+使用的GPU资源`-gpu 1`、并行度`-parallelism 1`(训练节点个数)，启动命令`-entry "python train.py"`
+和任务程序目录`fit_a_line/`。
+
+***说明1：*** 如果希望查看完整的任务提交参数说明，可以执行`paddlecloud submit -h`。
+
+***说明2：*** 每个任务推荐使用不同的jobname提交，这样之前的任务的代码和执行结果都会保存在云端。
+
+## 查看任务运行状态和日志
+
+任务启动之后，可以用过命令`paddlecloud get jobs`查看正在运行的任务：
+```bash
+paddlecloud get jobs
+NUM   NAME         SUCC    FAIL    START                  COMP                   ACTIVE
+0     fit-a-line   <nil>   <nil>   2017-06-26T08:41:01Z   <nil>                  1
+```
+
+其中， “ACTIVE”表示正在运行的节点个数，“SUCC”表示运行成功的节点个数，“FAIL”表示运行失败的节点个数。
+
+然后，使用下面的命令可以查看正在运行或完成运行任务的日志：
+
+```bash
+ paddlecloud logs fit-a-line
+Test 28, Cost 13.184950
+append file: /pfs/dlnel/public/dataset/uci_housing/train-00000.pickle
+append file: /pfs/dlnel/public/dataset/uci_housing/train-00001.pickle
+append file: /pfs/dlnel/public/dataset/uci_housing/train-00002.pickle
+append file: /pfs/dlnel/public/dataset/uci_housing/train-00003.pickle
+append file: /pfs/dlnel/public/dataset/uci_housing/train-00004.pickle
+Pass 28, Batch 0, Cost 9.695825
+Pass 28, Batch 100, Cost 14.143484
+Pass 28, Batch 200, Cost 11.380404
+Test 28, Cost 13.184950
+...
+# logs命令默认返回10条末尾的日志，如果需要查看更多的日志，
+# 也可以使用-n参数指定日志的条数
+paddlecloud logs -n 100 fit-a-line
+...
+```
+
+任务执行完成后，任务的状态会显示为如下状态：
+
+```bash
+paddlecloud get jobs
+NUM   NAME         SUCC   FAIL    START                  COMP                   ACTIVE
+0     fit-a-line   1      <nil>   2017-06-26T08:41:01Z   2017-06-26T08:41:29Z   <nil>
+```
+
+## 下载任务的模型输出
+
+任务成功执行后，训练程序一般会将模型输出保存在云端文件系统中，可以用下面的命令查看，并下载模型的输出：
+
+```
+paddlecloud file ls /pfs/dlnel/home/wuyi05@baidu.com/jobs/fit_a_line/
+train.py
+image
+output
+paddlecloud file ls /pfs/dlnel/home/wuyi05@baidu.com/jobs/fit_a_line/output/
+pass-0001.tar
+...
+paddlecloud file get /pfs/dlnel/home/wuyi05@baidu.com/jobs/fit_a_line/output/pass-0001.tar ./
+```
+
+模型下载之后，就可以把模型应用在预测服务等环境了。
+
+## 清除任务
+
+使用下面命令可以完全清除集群上的训练任务，清理之后，任务的历史日志将无法查看，但仍然可以在任务名的目录下找到之前的输出。
+
+```back
+paddlecloud kill fit-a-line
+```