Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to describe and use Network #1315

Closed
wangkuiyi opened this issue Feb 11, 2017 · 9 comments
Closed

How to describe and use Network #1315

wangkuiyi opened this issue Feb 11, 2017 · 9 comments
Assignees

Comments

@wangkuiyi
Copy link
Collaborator

We'd thought that a DL framework should implement concepts like model and cost. But we realized that these are not flexible enough to describe deep learning problems. Instead, we need the concept network. For more about this derivation, please refer to #1311.

In this issue, we are going to figure out how should we build a network and its parameters, and how can we train a network and use part of it (the model) for inference/serving.

@wangkuiyi wangkuiyi self-assigned this Feb 11, 2017
@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Feb 11, 2017

Here summarizes an idea from @helinwang and @emailweixu that changes the concepts listed in #1297 into the following:

  1. No concept of Model; instead, we introduce Network. The reason is listed in here.
  2. A Network consists of topology and parameters. But Network is not the essence; instead; topology and parameters are.
  3. Layers in the same network might share parameters, an example is in here, and
  4. Layers of different networks might share parameters too, as the GAN example will be presented later.

For how to describe networks and how to use it for convenient training, testing, and inference/serving, please see following comments.

@wangkuiyi wangkuiyi changed the title How to express and use Network How to describe and use Network Feb 11, 2017
@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Feb 12, 2017

Example 1. Sharing Parameters between Layers

We use the 3-branch ranking model in this example. For your convenience, I copy-a-paste the model's topology as follows:

A -> f -\
Q -> f --> cost
B -> f -/

The following program trains the topology including the cost, and then use the sub-network in the trained topology in inference:

def f(in):
    e = paddle.layer.embedding(in, parameter_name="embedding")
    o = paddle.layer.softmax(e, parameter_name="semantic")
    return o

# Create 3 topologies (subnets), they share parameters because all
# correspoinding layers have the same parameter names.
fA = f(paddle.layer.data(input_name="A"))
fB = f(paddle.layer.data(input_name="B"))
fQ = f(paddle.layer.data(input_name="Q"))

topology = paddle.layer.less_than(
               paddle.layer.cross_entropy(fA, fQ),
               paddle.layer.corss_entropy(fB, fQ))

# Derive parameters required in topology and create them in model.
parameters = paddle.parameters.create(topology)

# Estimate parameters used in topology from data.
paddle.train(topology, parameters, reader=read_ranking_model_data)

# Inference using fA (or fB or fC, as they share their parameters).
[testA, testB, testQ] = read_ranking_model_data()
print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)

@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Feb 13, 2017

Exmaple 2. Sharing Parameters between "Models"

We use GAN in
this example. In the following example program, d0 and d1 correspond to the two networks in the following figure:

def G(in):
    # over-simplified example as G has only one layers:
    return paddle.layer.fc(in, parameter_name="G") 

def D(in, parameters_mutable);
    # again, over-simplified:
    return paddle.layer.fc(in, parameters_name="D", parameters_mutable)

# Construct the first topology, which contains both D and G.
# By learning this topology, we update parameters of G.
d0 = paddle.layer.should_be_false(
         D(G(paddle.layer.data()),
           False)) # Don't update the parameter of D here.

# Construct a second topology d1, which contains only D. By
# training this topology, we update parameters of D.  Note 
# that d1 share parameters with d0.
d1 = paddle.layer.should_be_true(D(paddle.layer.data()))

# Create parameters from a list of multiple topologies (models) for
# the chance to share parameters between these topologies.
parameters = paddle.parameters.create([d0, d1])

# Iterative training of GAN.
for ...:
    train(d0, parameters, reader=read_from_rng)
    train(d1, parameters, reader=read_from_realistic_images)

# Use d1 for inference:
print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)

@reyoung
Copy link
Collaborator

reyoung commented Feb 13, 2017

Maybe a parameter pool(parameters in above code) and network topologies is a good abstraction?

To be trained Neural Network = a parameter pool + train network topology.
Inference Neural Network = the same parameter pool + inference network topology.

Is the Model or NeuralNetwork an important concept?

@helinwang
Copy link
Contributor

helinwang commented Feb 13, 2017

Maybe instead of specifying which parameter not to update here:

d0 = paddle.layer.should_be_false(
         D(G(paddle.layer.data()),
           False)) # Don't update the parameter of D here.

We can specify in train, which parameter to update, or by default update all.

@reyoung
Copy link
Collaborator

reyoung commented Feb 13, 2017

train函数里面,添加 event_handler的callback

附之前讨论的代码:

def train_reader():
    yield {'pixel': pixels, 'label': labels}  # return a data batch.

# Observe callback is used for plotting or logging the training process.
# The type of event parameter could be various. The intermediate result for 
# training is in event instance.
def callback(event):
     if isinstance(event, FinishTrainOneBatch):
        print event.pass_id, event.batch_id, "Cost = ", event.cost, "Error Rate = ", event.metric[0]
        print "output layer's output is ", event.activation['output']
        if event.batch_id % 1000 == 0:  # Even, we could save check point during callback.
            with open('check_point_%d' % event.batch_id, 'w') as stream:
                 optimizer.check_point(stream)
     else:
        pass

optimizer.train(train_reader=train_reader,  test_reader=None,  # Test reader shared the same 
                                                               # format of train reader. Could be None if no test data.
                cost=CrossEntropy(input=model.topology.output_layer,  # the network's output layer.
                                  label=DataReader("label")),  # Label is get from data_reader's 'label' field.
                metric=[ErrorRateMetric(input=model.topology.output_layer, label=DataReader("label"))], # same logic above
                observe_callback=callback
)

@helinwang
Copy link
Contributor

Added issue for separating updater and trainer: #1319

@jacquesqiao
Copy link
Member

if we need to put things about cost in a special namespace, like

paddle.layer.cost.cross_entropy
paddle.layer.cost.less_than

@wangkuiyi
Copy link
Collaborator Author

@reyoung For your comment, It seems that given the event_handler mechanism, we don't need to pass matrics to function train; instead, we can calculate those metrics in the event_handler and plot them in necessary?

wangkuiyi added a commit that referenced this issue Feb 14, 2017
Update API design doc according to discussions in issue #1315
wangxicoding pushed a commit to wangxicoding/Paddle that referenced this issue Dec 9, 2021
…addlePaddle#1315)

* Add yacs and numpy to requirements; Update LayoutXLM README.md

* try_import yacs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants