-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
English Document Structure #620
Changes from 7 commits
64f4e54
a48f19c
345c626
80fb911
5e7e5cc
aa6c6dc
74c48d1
7a846e4
5c551f9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
About | ||
======= | ||
|
||
|
||
Credits | ||
-------- | ||
|
||
PaddlPaddle is an easy-to-use, efficient, flexible and scalable deep learning platform, | ||
which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu. | ||
|
||
PaddlePaddle is now open source but far from complete, which is intended to be built upon, improved, scaled, and extended. | ||
We hope to build an active open source community both by providing feedback and by actively contributing to the source code. | ||
|
||
We owe many thanks to `all contributors and developers <https://github.com/PaddlePaddle/Paddle/blob/develop/authors>`_ of PaddlePaddle! |
This file was deleted.
This file was deleted.
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# API | ||
|
||
## Data Provider | ||
|
||
* [Introduction](data_provider/index.rst) | ||
* [PyDataProvider2](data_provider/pydataprovider2.rst) | ||
|
||
## Trainer Configuration | ||
|
||
* [Model Config Interface](trainer_config_helpers/index.rst) | ||
|
||
## Predict | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Predict 这个词得改一下。上次我们讨论是说到,Paddle不只是用来训练supervised model的。Predict应该改成“Applications”。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
* [Python Prediction API](predict/swig_py_paddle_en.rst) |
This file was deleted.
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Algorithm Configuration | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Algorithm是指递归、动态规划、分治策略、搜索剪枝这些事儿。和RNN没半毛钱关系。这里是不是Chinglish了?实际上想说的是 Deep Models 吗? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
======================= | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
rnn/rnn.rst |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../tutorials/sentiment_analysis/bi_lstm.jpg | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这个文件是一个symlibic link吗?为什么需要给图像文件建立symbolic link呢?如果是在rst文章里引用,那么在rst文件里引用 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这个symlibic link是之前就有,我也觉得可以去掉。 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../tutorials/text_generation/encoder-decoder-attention-model.png | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 问题同上。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Command Line Argument | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 标题应该是“如何设置命令行参数”吧?—— How to Set Command-line Parameters parameter和argument不是一个东西 http://stackoverflow.com/questions/1788923/parameter-vs-argument There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
||
* [Use Case](use_case.md) | ||
* [Argument Outline](argument_outline.md) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Argument Outline ==> Arguments There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
* [Detailed Descriptions](detail_introduction.md) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Layer Documents | ||
|
||
* [Layer Python API](../../api/trainer_config_helpers/index.rst) | ||
* [Layer Source Code](source/gserver/layers.rst) | ||
* [Writing New Layers](new_layer/new_layer.rst) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
How to | ||
======= | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
cmd_argument/index.md | ||
cluster/cluster_train.md | ||
algorithm/index.rst | ||
optimization/index.rst | ||
dev/index.rst | ||
contribute_to_paddle.md |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,8 +4,9 @@ PaddlePaddle Documentation | |
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
introduction/index.md | ||
user_guide.rst | ||
dev/index.rst | ||
algorithm/index.rst | ||
optimization/index.rst | ||
introduction/index.rst | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 我们讨论的文档结构是和Tensorflow的完全一致的把?Tensorflow里没有一部分叫做 Introduction的,这里为什么有? Tensorflow的文档里有一部分叫 Get Started,这里为什么没有? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 看错了 确实应该是Get Started. done |
||
tutorials/index.md | ||
howto/index.rst | ||
api/index.rst | ||
about/index.rst | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
Basic Usage | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这个basic usage是想作为Get Started那部分的吗? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 是的。 类似于tf, install 完成之后,应该有一个简单的案例,所以这里放着linear regression |
||
============= | ||
|
||
PaddlePaddle is a deep learning platform open-sourced by Baidu. With PaddlePaddle, you can easily train a classic neural network within a couple lines of configuration, or you can build sophisticated models that provide state-of-the-art performance on difficult learning tasks like sentiment analysis, machine translation, image caption and so on. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @zhouxiao-coder 原作者说自己亲自来改。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这一段可以简化成:
|
||
|
||
1. A Classic Problem | ||
--------------------- | ||
|
||
Now, to give you a hint of what using PaddlePaddle looks like, let's start with a fundamental learning problem - `simple linear regression <https://en.wikipedia.org/wiki/Simple_linear_regression>`_: you have observed a set of two-dimensional data points of ``X`` and ``Y``, where ``X`` is an explanatory variable and ``Y`` is corresponding dependent variable, and you want to recover the underlying correlation between ``X`` and ``Y``. Linear regression can be used in many practical scenarios. For example, ``X`` can be a variable about house size, and ``Y`` a variable about house price. You can build a model that captures relationship between them by observing real estate markets. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这一段写得和 https://en.wikipedia.org/wiki/Simple_linear_regression 里的描述重复了。我建议就用wikipedia里的原话,抄一遍就行 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wangkuiyi 这篇其实不是我写的,我只是调整了文档所在的位置和格式风格。所以看起来是新增的。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 了解了。好的。 |
||
|
||
2. Prepare the Data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prepare the Data 表意不明确—— 之前并没有提到什么data,何来 the data呢? 另外,这一节介绍的实际内容是 Load the Training Data,就以此为标题好了。 |
||
-------------------- | ||
|
||
Suppose the true relationship can be characterized as ``Y = 2X + 0.3``, let's see how to recover this pattern only from observed data. Here is a piece of python code that feeds synthetic data to PaddlePaddle. The code is pretty self-explanatory, the only extra thing you need to add for PaddlePaddle is a definition of input data types. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这一节的重点是要介绍Paddle job通常是通过 Python 写的 data provider 来读取数据的。所以第一段的第一句应该是介绍 data provider的。比如:
|
||
|
||
.. code-block:: python | ||
|
||
# dataprovider.py | ||
from paddle.trainer.PyDataProvider2 import * | ||
import random | ||
|
||
# define data types of input: 2 real numbers | ||
@provider(input_types=[dense_vector(1), dense_vector(1)],use_seq=False) | ||
def process(settings, input_file): | ||
for i in xrange(2000): | ||
x = random.random() | ||
yield [x], [2*x+0.3] | ||
|
||
3. Train a NeuralNetwork | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NeuralNetwork ==> Neural Network |
||
------------------------- | ||
|
||
To recover this relationship between ``X`` and ``Y``, we use a neural network with one layer of linear activation units and a square error cost layer. Don't worry if you are not familiar with these terminologies, it's just saying that we are starting from a random line ``Y' = wX + b`` , then we gradually adapt ``w`` and ``b`` to minimize the difference between ``Y'`` and ``Y``. Here is what it looks like in PaddlePaddle: | ||
|
||
.. code-block:: python | ||
|
||
# trainer_config.py | ||
from paddle.trainer_config_helpers import * | ||
|
||
# 1. read data. Suppose you saved above python code as dataprovider.py | ||
data_file = 'empty.list' | ||
with open(data_file, 'w') as f: f.writelines(' ') | ||
define_py_data_sources2(train_list=data_file, test_list=None, | ||
module='dataprovider', obj='process',args={}) | ||
|
||
# 2. learning algorithm | ||
settings(batch_size=12, learning_rate=1e-3, learning_method=MomentumOptimizer()) | ||
|
||
# 3. Network configuration | ||
x = data_layer(name='x', size=1) | ||
y = data_layer(name='y', size=1) | ||
y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b')) | ||
cost = regression_cost(input=y_predict, label=y) | ||
outputs(cost) | ||
|
||
Some of the most fundamental usages of PaddlePaddle are demonstrated: | ||
|
||
- The first part shows how to feed data into PaddlePaddle. In general cases, PaddlePaddle reads raw data from a list of files, and then do some user-defined process to get real input. In this case, we only need to create a placeholder file since we are generating synthetic data on the fly. | ||
|
||
- The second part describes learning algorithm. It defines in what ways adjustments are made to model parameters. PaddlePaddle provides a rich set of optimizers, but a simple momentum based optimizer will suffice here, and it processes 12 data points each time. | ||
|
||
- Finally, the network configuration. It usually is as simple as "stacking" layers. Three kinds of layers are used in this configuration: | ||
- **Data Layer**: a network always starts with one or more data layers. They provide input data to the rest of the network. In this problem, two data layers are used respectively for ``X`` and ``Y``. | ||
- **FC Layer**: FC layer is short for Fully Connected Layer, which connects all the input units to current layer and does the actual computation specified as activation function. Computation layers like this are the fundamental building blocks of a deeper model. | ||
- **Cost Layer**: in training phase, cost layers are usually the last layers of the network. They measure the performance of current model, and provide guidence to adjust parameters. | ||
|
||
Now that everything is ready, you can train the network with a simple command line call: | ||
|
||
.. code-block:: bash | ||
|
||
paddle train --config=trainer_config.py --save_dir=./output --num_passes=30 | ||
|
||
|
||
This means that PaddlePaddle will train this network on the synthectic dataset for 30 passes, and save all the models under path ``./output``. You will see from the messages printed out during training phase that the model cost is decreasing as time goes by, which indicates we are getting a closer guess. | ||
|
||
|
||
4. Evaluate the Model | ||
----------------------- | ||
|
||
Usually, a different dataset that left out during training phase should be used to evalute the models. However, we are lucky enough to know the real answer: ``w=2, b=0.3``, thus a better option is to check out model parameters directly. | ||
|
||
In PaddlePaddle, training is just to get a collection of model parameters, which are ``w`` and ``b`` in this case. Each parameter is saved in an individual file in the popular ``numpy`` array format. Here is the code that reads parameters from last pass. | ||
|
||
.. code-block:: python | ||
|
||
import numpy as np | ||
import os | ||
|
||
def load(file_name): | ||
with open(file_name, 'rb') as f: | ||
f.read(16) # skip header for float type. | ||
return np.fromfile(f, dtype=np.float32) | ||
|
||
print 'w=%.6f, b=%.6f' % (load('output/pass-00029/w'), load('output/pass-00029/b')) | ||
# w=1.999743, b=0.300137 | ||
|
||
.. image:: parameters.png | ||
:align: center | ||
|
||
Although starts from a random guess, you can see that value of ``w`` changes quickly towards 2 and ``b`` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer. | ||
|
||
There, you have recovered the underlying pattern between ``X`` and ``Y`` only from observed data. | ||
|
||
|
||
5. Where to Go from Here | ||
------------------------- | ||
|
||
- `Install and Build <../build_and_install/index.html>`_ | ||
- `Tutorials <../demo/quick_start/index_en.html>`_ | ||
- `Example and Demo <../demo/index.html>`_ |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
Introduction | ||
============ | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
build_and_install/index.rst | ||
basic_usage/basic_usage.rst |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Examples and demos | ||
# Tutorials | ||
There are serveral examples and demos here. | ||
|
||
## Image | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下面的内容都和credit没有关系。credit是“归功于谁”的意思。下面两段话应该是 About 下面的内容,最后一段关于authors的应该是 Credit 下面的内容。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done