Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English Document Structure #620

Merged
merged 9 commits into from
Nov 29, 2016
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/about/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
About
=======


Credits
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下面的内容都和credit没有关系。credit是“归功于谁”的意思。下面两段话应该是 About 下面的内容,最后一段关于authors的应该是 Credit 下面的内容。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

--------

PaddlPaddle is an easy-to-use, efficient, flexible and scalable deep learning platform,
which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.

PaddlePaddle is now open source but far from complete, which is intended to be built upon, improved, scaled, and extended.
We hope to build an active open source community both by providing feedback and by actively contributing to the source code.

We owe many thanks to `all contributors and developers <https://github.com/PaddlePaddle/Paddle/blob/develop/authors>`_ of PaddlePaddle!
7 changes: 0 additions & 7 deletions doc/algorithm/index.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/algorithm/rnn/bi_lstm.jpg

This file was deleted.

1 change: 0 additions & 1 deletion doc/algorithm/rnn/encoder-decoder-attention-model.png

This file was deleted.

File renamed without changes.
14 changes: 14 additions & 0 deletions doc/api/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# API

## Data Provider

* [Introduction](data_provider/index.rst)
* [PyDataProvider2](data_provider/pydataprovider2.rst)

## Trainer Configuration

* [Model Config Interface](trainer_config_helpers/index.rst)

## Predict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predict 这个词得改一下。上次我们讨论是说到,Paddle不只是用来训练supervised model的。Predict应该改成“Applications”。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


* [Python Prediction API](predict/swig_py_paddle_en.rst)
File renamed without changes.
8 changes: 0 additions & 8 deletions doc/cluster/index.rst

This file was deleted.

4 changes: 0 additions & 4 deletions doc/dev/layer.md

This file was deleted.

7 changes: 7 additions & 0 deletions doc/howto/algorithm/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Algorithm Configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Algorithm是指递归、动态规划、分治策略、搜索剪枝这些事儿。和RNN没半毛钱关系。这里是不是Chinglish了?实际上想说的是 Deep Models 吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

=======================

.. toctree::
:maxdepth: 1

rnn/rnn.rst
1 change: 1 addition & 0 deletions doc/howto/algorithm/rnn/bi_lstm.jpg
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ In this article, we explain how to run distributed Paddle training jobs on clust
1. Aforementioned scripts use a Python library [fabric](http://www.fabfile.org/) to run SSH commands. We can use `pip` to install fabric:

```bash
pip install fabric
pip install fabric
```

1. We need to install PaddlePaddle on all nodes in the cluster. To enable GPUs, we need to install CUDA in `/usr/local/cuda`; otherwise Paddle would report errors at runtime.
Expand Down
5 changes: 5 additions & 0 deletions doc/howto/cmd_argument/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Command Line Argument
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

标题应该是“如何设置命令行参数”吧?—— How to Set Command-line Parameters

parameter和argument不是一个东西 http://stackoverflow.com/questions/1788923/parameter-vs-argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


* [Use Case](use_case.md)
* [Argument Outline](argument_outline.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument Outline ==> Arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* [Detailed Descriptions](detail_introduction.md)
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions doc/dev/index.rst → doc/howto/dev/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ Development Guide
=================

.. toctree::
:maxdepth: 1
:maxdepth: 2

layer.md
new_layer/new_layer.rst
../source/index.md
source/index.rst
5 changes: 5 additions & 0 deletions doc/howto/dev/layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Layer Documents

* [Layer Python API](../../api/trainer_config_helpers/index.rst)
* [Layer Source Code](source/gserver/layers.rst)
* [Writing New Layers](new_layer/new_layer.rst)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
12 changes: 12 additions & 0 deletions doc/howto/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
How to
=======

.. toctree::
:maxdepth: 1

cmd_argument/index.md
cluster/cluster_train.md
algorithm/index.rst
optimization/index.rst
dev/index.rst
contribute_to_paddle.md
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
11 changes: 6 additions & 5 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ PaddlePaddle Documentation
.. toctree::
:maxdepth: 1

introduction/index.md
user_guide.rst
dev/index.rst
algorithm/index.rst
optimization/index.rst
introduction/index.rst
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们讨论的文档结构是和Tensorflow的完全一致的把?Tensorflow里没有一部分叫做 Introduction的,这里为什么有?

Tensorflow的文档里有一部分叫 Get Started,这里为什么没有?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看错了 确实应该是Get Started. done

tutorials/index.md
howto/index.rst
api/index.rst
about/index.rst

109 changes: 109 additions & 0 deletions doc/introduction/basic_usage/basic_usage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
Basic Usage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个basic usage是想作为Get Started那部分的吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的。 类似于tf, install 完成之后,应该有一个简单的案例,所以这里放着linear regression

=============

PaddlePaddle is a deep learning platform open-sourced by Baidu. With PaddlePaddle, you can easily train a classic neural network within a couple lines of configuration, or you can build sophisticated models that provide state-of-the-art performance on difficult learning tasks like sentiment analysis, machine translation, image caption and so on.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看到好多文档里都有对Paddle的介绍。在Get Started里不需要重复介绍了。请删掉吧。

也请看看Tensorflow是怎么写Get Started的:

screen shot 2016-11-28 at 10 32 47 am

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouxiao-coder 原作者说自己亲自来改。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一段可以简化成:

Let's run Paddle to learn a very simple linear regression model!


1. A Classic Problem
---------------------

Now, to give you a hint of what using PaddlePaddle looks like, let's start with a fundamental learning problem - `simple linear regression <https://en.wikipedia.org/wiki/Simple_linear_regression>`_: you have observed a set of two-dimensional data points of ``X`` and ``Y``, where ``X`` is an explanatory variable and ``Y`` is corresponding dependent variable, and you want to recover the underlying correlation between ``X`` and ``Y``. Linear regression can be used in many practical scenarios. For example, ``X`` can be a variable about house size, and ``Y`` a variable about house price. You can build a model that captures relationship between them by observing real estate markets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一段写得和 https://en.wikipedia.org/wiki/Simple_linear_regression 里的描述重复了。我建议就用wikipedia里的原话,抄一遍就行

screen shot 2016-11-28 at 10 34 22 am

Copy link
Contributor Author

@gangliao gangliao Nov 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangkuiyi 这篇其实不是我写的,我只是调整了文档所在的位置和格式风格。所以看起来是新增的。
要不先简单的改下,之后重新开一个PR, 单独改吧。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

了解了。好的。


2. Prepare the Data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepare the Data 表意不明确—— 之前并没有提到什么data,何来 the data呢?

另外,这一节介绍的实际内容是 Load the Training Data,就以此为标题好了。

--------------------

Suppose the true relationship can be characterized as ``Y = 2X + 0.3``, let's see how to recover this pattern only from observed data. Here is a piece of python code that feeds synthetic data to PaddlePaddle. The code is pretty self-explanatory, the only extra thing you need to add for PaddlePaddle is a definition of input data types.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一节的重点是要介绍Paddle job通常是通过 Python 写的 data provider 来读取数据的。所以第一段的第一句应该是介绍 data provider的。比如:

A PaddlePaddle job usually loads the training data by implementing a Python data provider. A data provider is a Python function which is called by PaddlePaddel trainer program, so it could adapt to any data format. We can write data provider to read from local filesystem, HDFS, databases, S3 or almost anywhere. In this example, our data provider synthesize the training data by sampling from the line Y=2X + 0.3.


.. code-block:: python

# dataprovider.py
from paddle.trainer.PyDataProvider2 import *
import random

# define data types of input: 2 real numbers
@provider(input_types=[dense_vector(1), dense_vector(1)],use_seq=False)
def process(settings, input_file):
for i in xrange(2000):
x = random.random()
yield [x], [2*x+0.3]

3. Train a NeuralNetwork
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NeuralNetwork ==> Neural Network

-------------------------

To recover this relationship between ``X`` and ``Y``, we use a neural network with one layer of linear activation units and a square error cost layer. Don't worry if you are not familiar with these terminologies, it's just saying that we are starting from a random line ``Y' = wX + b`` , then we gradually adapt ``w`` and ``b`` to minimize the difference between ``Y'`` and ``Y``. Here is what it looks like in PaddlePaddle:

.. code-block:: python

# trainer_config.py
from paddle.trainer_config_helpers import *

# 1. read data. Suppose you saved above python code as dataprovider.py
data_file = 'empty.list'
with open(data_file, 'w') as f: f.writelines(' ')
define_py_data_sources2(train_list=data_file, test_list=None,
module='dataprovider', obj='process',args={})

# 2. learning algorithm
settings(batch_size=12, learning_rate=1e-3, learning_method=MomentumOptimizer())

# 3. Network configuration
x = data_layer(name='x', size=1)
y = data_layer(name='y', size=1)
y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b'))
cost = regression_cost(input=y_predict, label=y)
outputs(cost)

Some of the most fundamental usages of PaddlePaddle are demonstrated:

- The first part shows how to feed data into PaddlePaddle. In general cases, PaddlePaddle reads raw data from a list of files, and then do some user-defined process to get real input. In this case, we only need to create a placeholder file since we are generating synthetic data on the fly.

- The second part describes learning algorithm. It defines in what ways adjustments are made to model parameters. PaddlePaddle provides a rich set of optimizers, but a simple momentum based optimizer will suffice here, and it processes 12 data points each time.

- Finally, the network configuration. It usually is as simple as "stacking" layers. Three kinds of layers are used in this configuration:
- **Data Layer**: a network always starts with one or more data layers. They provide input data to the rest of the network. In this problem, two data layers are used respectively for ``X`` and ``Y``.
- **FC Layer**: FC layer is short for Fully Connected Layer, which connects all the input units to current layer and does the actual computation specified as activation function. Computation layers like this are the fundamental building blocks of a deeper model.
- **Cost Layer**: in training phase, cost layers are usually the last layers of the network. They measure the performance of current model, and provide guidence to adjust parameters.

Now that everything is ready, you can train the network with a simple command line call:

.. code-block:: bash

paddle train --config=trainer_config.py --save_dir=./output --num_passes=30


This means that PaddlePaddle will train this network on the synthectic dataset for 30 passes, and save all the models under path ``./output``. You will see from the messages printed out during training phase that the model cost is decreasing as time goes by, which indicates we are getting a closer guess.


4. Evaluate the Model
-----------------------

Usually, a different dataset that left out during training phase should be used to evalute the models. However, we are lucky enough to know the real answer: ``w=2, b=0.3``, thus a better option is to check out model parameters directly.

In PaddlePaddle, training is just to get a collection of model parameters, which are ``w`` and ``b`` in this case. Each parameter is saved in an individual file in the popular ``numpy`` array format. Here is the code that reads parameters from last pass.

.. code-block:: python

import numpy as np
import os

def load(file_name):
with open(file_name, 'rb') as f:
f.read(16) # skip header for float type.
return np.fromfile(f, dtype=np.float32)

print 'w=%.6f, b=%.6f' % (load('output/pass-00029/w'), load('output/pass-00029/b'))
# w=1.999743, b=0.300137

.. image:: parameters.png
:align: center

Although starts from a random guess, you can see that value of ``w`` changes quickly towards 2 and ``b`` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer.

There, you have recovered the underlying pattern between ``X`` and ``Y`` only from observed data.


5. Where to Go from Here
-------------------------

- `Install and Build <../build_and_install/index.html>`_
- `Tutorials <../demo/quick_start/index_en.html>`_
- `Example and Demo <../demo/index.html>`_
Binary file added doc/introduction/basic_usage/parameters.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ Install PaddlePaddle
:maxdepth: 1
:glob:

install_*
internal/install_from_jumbo.md
docker_install.rst
ubuntu_install.rst

Expand All @@ -24,5 +22,4 @@ Build from Source
:maxdepth: 1
:glob:

build_from_source.md
contribute_to_paddle.md
build_from_source.md
100 changes: 0 additions & 100 deletions doc/introduction/index.md

This file was deleted.

8 changes: 8 additions & 0 deletions doc/introduction/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Introduction
============

.. toctree::
:maxdepth: 2

build_and_install/index.rst
basic_usage/basic_usage.rst
1 change: 0 additions & 1 deletion doc/introduction/parameters.png

This file was deleted.

2 changes: 1 addition & 1 deletion doc/demo/index.md → doc/tutorials/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Examples and demos
# Tutorials
There are serveral examples and demos here.

## Image
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading