Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paddle API v4 proposal #10152

Closed
helinwang opened this issue Apr 24, 2018 · 14 comments
Closed

Paddle API v4 proposal #10152

helinwang opened this issue Apr 24, 2018 · 14 comments

Comments

@helinwang
Copy link
Contributor

helinwang commented Apr 24, 2018

Pytorch uses Python to call each operator, the following API uses Python to call each compiled function.

fit_a_line.py

class FitALine(fluid.Program):
  DEFAULT_READER_FILE_PATH =  './data.recordio'
  DEFAULT_READER_BATCH_SIZE = 128 

  # @network decorator will be used by the compiler to generate a
  # ProgramDesc block.  It can optionally take inputs, which represents a
  # mapping from the input vars to vars in the block
  @network()
  def train_step(self):
    reader = fluid.batch_reader(file=DEFAULT_READER_FILE_PATH,
                                               batch_size=DEFAULT_READER_BATCH_SIZE,
                                               shape=[[13], [1]], 
                                               dtype=['float32', 'float32'],
                                               format='recordio')
    x, y = self.reader.next_item()
    with fluid.var_scope('prediction'):
      # Since we want to be able to access the same weights and bias during
      # inference, we need to namespace the variables.
      # fluid.var_scope block guard will create a new UniqueNameGenerator when
      # we enter the block, and rollback to the previous UniqueNameGenerator when
      # the block exits.
      y_predict = fluid.layers.fc(input=x, size=1, act=None)

    cost = fluid.layers.square_error_cost(input=y_predict, label=y)
    avg_cost = fluid.layers.mean(cost, name='avg_cost')
    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
    sgd_optimizer.minimize(avg_cost)
    return avg_cost

  @network("x")
  def infer(self):
    x = fluid.layers.data(name='x', shape=[13], dtype='float32')
    with fluid.var_scope('prediction'):
      return fluid.layers.fc(input=x, size=1, act=None)

main.py

fit_a_line = FitALine(batch_size=256).Compile()

while i in range(1000):
  avg_cost = fit_a_line.train_step()

y_results = fit_a_line.infer([3,4,6,3,5,7,8,6,5,4,1,5,8])

Transpiler

Trainer

> TRAINING_ROLE=TRAINER python main.py --distributed main=train_step

PServer

> TRAINING_ROLE=PSERVER paddle run train.py --distributed main=train_step
@helinwang helinwang changed the title Paddle API v4 - fetch data Paddle API v4 - fetch data during training steps Apr 24, 2018
@cs2be
Copy link
Contributor

cs2be commented Apr 24, 2018

this looks good. one question, when is the reader initialized? when the first iteration is ran? we probably need to implicitly do this, maybe by creating a static or shared reader?

@abhinavarora
Copy link
Contributor

Is the assumption here that we can return a tuple if we want to capture multiple variables during the training steps? For example, we should be able to do return y_predict, avg_cost.

@varunarora
Copy link

varunarora commented Apr 24, 2018

How about:

fit_a_line.py

...
  def train_step(batch_size):
    ...
    return self.program({ 'avg_cost': avg_cost })
...

main.py

from fit_a_line import FitALine

trainer = FitALine().train_step(batch_size=256)

while i in range(1000):
  cost = trainer()['avg_cost']

y_results = FitALine().infer([3,4,6,3,5,7,8,6,5,4,1,5,8])

@helinwang helinwang changed the title Paddle API v4 - fetch data during training steps Paddle API v4 - explicit initialization, and fetch data during training steps Apr 24, 2018
@wangkuiyi
Copy link
Collaborator

Thanks for this design @cs2be and @helinwang ! I have a few questions:

  1. Should users be able to define arbitrary methods with arbitrary names and call program.run('method_name'), or only a few pre-determined methods with certain names could be overloaded in class FitALine? If it is the latter case, could we have a list of pre-determined methods that are overloadable?

  2. It seems, and I just want to confirm, that each class (FitALine) will be compiled into a ProgramDesc and each method in the class a block?

  3. How could we map method arguments to block's inputs? Do we need to define RecordIO read operators in the blocks, and assume that each argument corresponds to a field?

@cs2be
Copy link
Contributor

cs2be commented Apr 24, 2018

Hi @wangkuiyi,

  1. This is a good point, we are still debating on this, since transpiler may need to know what method the main program is in. Currently for transpilation step, we let users define what the main program method is.

  2. We are thinking that each method will be compiled to a ProgramDesc. This will allow the users to run them in any order they want. We still need to refine this idea.

  3. This is a good point, I'll discuss with team to see what a good solution for this is.

@helinwang
Copy link
Contributor Author

helinwang commented Apr 25, 2018

Thanks for reviewing @wangkuiyi ! @cs2be and me discussed, here are the replies:

Should users be able to define arbitrary methods with arbitrary names and call program.run('method_name'), or only a few pre-determined methods with certain names could be overloaded in class FitALine? If it is the latter case, could we have a list of pre-determined methods that are overloadable?

Many thanks to @cs2be for coming out this idea. I think the power of this idea comes from it enables the state to be shared cross many methods:

program = FitALine().Compile()
# program.train knows the block ID,
# so program desc does not have to store function name.
program.train()
program.infer()

In the above code, train and infer shares the same state (scope).

The user could come up with many things that they want to do: train, infe, train_step, save_model, load_model, ...
So I would prefer that we give the user this flexibility to define whatever method they want. It may be more coherent with the programming language idea: a programer can write any method with his favorite programming language.

It seems, and I just want to confirm, that each class (FitALine) will be compiled into a ProgramDesc and each method in the class a block?

Yes, that is correct.

How could we map method arguments to block's inputs? Do we need to define RecordIO read operators in the blocks, and assume that each argument corresponds to a field?

Great point, @cs2be and I have discussed and updated the example. The block input will be a var, created by fluid.layers.data. The name mapping is indicated by @Input('x'), so program.infer([3,4,6,3,5,7,8,6,5,4,1,5,8]) would know which var to set [3,4,6,3,5,7,8,6,5,4,1,5,8] to.

  @Input('x')
  def infer(x):
    x = fluid.layers.data(name='x', shape=[13], dtype='float32')
    return fluid.layers.fc(input=x, size=1, act=None)

Summary: Pytorch uses Python to call each operator, this API uses Python to call each compiled function.

@panyx0718
Copy link
Contributor

I think this proposal looks mostly good!

One question:

> TRAINING_ROLE=TRAINER python main.py --distributed main=train_step

How does the Compile() logic magically knows how to transpile a program into
trainer part and ps part?

In more detail, how does the transpiler know which operators and variables should
be placed in trainer and others be placed in parameter servers?

@helinwang
Copy link
Contributor Author

@panyx0718 thanks for the review!

How does the Compile() logic magically knows how to transpile a program into
trainer part and ps part?
In more detail, how does the transpiler know which operators and variables should
be placed in trainer and others be placed in parameter servers?

Our current transpiler implementation does "magically" make everything work. Indeed, it is hacky because a lot of assumptions has been made. To improve, the transpiler needs to know more information, such as "which operators belongs to the optimizers", "which operators belong to the gradient calculating pass", "which operators are explicitly defined by the user". I think we should store these information in the program desc.

Do you think the API makes sense, regardless of the current transpiler implementation (or assuming we have implemented it in the non-hacky way)?

@abhinavarora
Copy link
Contributor

Can we change fluid.var_scope to fluid.scope because Variable is a C++ concept in Paddle and should not be exposed to Python users

@helinwang helinwang changed the title Paddle API v4 - explicit initialization, and fetch data during training steps Paddle API v4 proposal Apr 26, 2018
@panyx0718
Copy link
Contributor

panyx0718 commented Apr 26, 2018

@helinwang

I think the skeleton looks fine.
Though I still don't know how to achieve the following with "--distributed" option

the transpiler needs to know more information, such as "which operators belongs to the optimizers", "which operators belong to the gradient calculating pass", "which operators are explicitly defined by the user"

There can be several configuration options for "distributed" training

@panyx0718
Copy link
Contributor

@PaddleCI

Other Paddle team members can take a look at this proposal.

@jacquesqiao
Copy link
Member

Can @network() support training multiple networks, like GAN?

@helinwang
Copy link
Contributor Author

helinwang commented Apr 26, 2018

@jacquesqiao thanks for reviewing! Yes I think so. Multiple class methods annotated by @network() share variable by using block guard with fluid.var_scope("shared_name_scope") when writing the Fluid program.
The "compiled program" generated by fluid.Program.Compile owns the scope, which contains the parameters of however many networks.

@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests