Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix V2 API #2288

Merged
merged 3 commits into from
May 30, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions paddle/parameter/Parameter.h
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,7 @@ class Parameter {
std::vector<std::shared_ptr<IParameterUpdaterHook>> updaterHooks_;

public:
void setSharedCount(int cnt) { sharedCount_ = cnt; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find anywhere this newly added method is called? Is it necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed by reyoung in #2271. It's a function used internally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Are we supposed to run git pull upstream develop && git push here so to remove this line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that we still need this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that shared count should be separated from paddle::Parameter. Because if we support train multiple neural networks(i.e. topologies) in the one process, we should let GradientMachine to maintain shared count? Each GradientMachine would handle one neural network and Parameter's shared count is used inside one neural network.

Moreover, should we remove the logic about backward callback or pipeline in communication? It gives us not much performance gain but makes our this logic a little bit complicated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看起来shared count应该脱离paddle::Parameter放入GradientMachine中。因为我们要支持在一个进程中同时训练多个神经网络,而shared count只和一个神经网络有关系。所以应该将shared_count放入GradientMachine中。

更进一步,我们是不是应该将pipeline通信和backward中的callback去掉?因为虽然这个功能给Paddle的多机通信带来了一点性能提升,但是增加了一些实现的复杂性。如果去掉这个功能可以简化Paddle的实现。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this PR is a fix. How about we do the successive edits in future PRs? @reyoung

int getSharedCount() { return sharedCount_; }

bool isSparse() { return config_.is_sparse(); }
Expand Down
31 changes: 17 additions & 14 deletions python/paddle/trainer/config_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -3371,7 +3371,7 @@ def Import(config_file, local_args={}):
return Import


settings = dict(
DEFAULT_SETTING = dict(
batch_size=None,
mini_batch_size=None,
algorithm='async_sgd',
Expand Down Expand Up @@ -3404,6 +3404,8 @@ def Import(config_file, local_args={}):
adam_beta2=0.999,
adam_epsilon=1e-8, )

settings = copy.deepcopy(DEFAULT_SETTING)

settings_deprecated = dict(usage_ratio=1., )

trainer_settings = dict(
Expand Down Expand Up @@ -3544,23 +3546,32 @@ def update_g_config():
return g_config


def parse_config(trainer_config, config_arg_str):
def begin_parse(config_arg_str=''):
'''
@param trainer_config: can be a string of config file name or a function name
with config logic
@param config_arg_str: a string of the form var1=val1,var2=val2. It will be
passed to config script as a dictionary CONFIG_ARGS
'''
init_config_environment()
for hook in _parse_config_hooks:
hook()

config_args = {}

logger.findCaller = find_caller
logger.fatal = my_fatal

g_config.model_config.type = "nn"

global g_current_submodel, g_root_submodel
g_root_submodel = g_config.model_config.sub_models.add()
g_root_submodel.name = 'root'
g_root_submodel.is_recurrent_layer_group = False
g_current_submodel = g_root_submodel


def parse_config(trainer_config, config_arg_str):
begin_parse(config_arg_str)

config_args = {}

if config_arg_str:
config_args = dict([f.split('=') for f in config_arg_str.split(',')])

Expand All @@ -3573,14 +3584,6 @@ def parse_config(trainer_config, config_arg_str):
extension_module = importlib(extension_module_name)
g_extended_config_funcs = extension_module.get_config_funcs(g_config)

g_config.model_config.type = 'nn'

global g_current_submodel, g_root_submodel
g_root_submodel = g_config.model_config.sub_models.add()
g_root_submodel.name = 'root'
g_root_submodel.is_recurrent_layer_group = False
g_current_submodel = g_root_submodel

if hasattr(trainer_config, '__call__'):
trainer_config.func_globals.update(
make_config_environment("", config_args))
Expand Down
21 changes: 17 additions & 4 deletions python/paddle/trainer_config_helpers/config_parser_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,18 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import copy
import paddle.trainer.config_parser as config_parser
from paddle.proto.TrainerConfig_pb2 import OptimizationConfig
'''
This file is a wrapper of formal config_parser. The main idea of this file is to
This file is a wrapper of formal config_parser. The main idea of this file is to
separete different config logic into different function, such as network configuration
and optimizer configuration.
'''

__all__ = [
"parse_trainer_config", "parse_network_config", "parse_optimizer_config"
"parse_trainer_config", "parse_network_config", "parse_optimizer_config",
"reset_parser"
]


Expand All @@ -34,5 +37,15 @@ def parse_network_config(network_conf, config_arg_str=''):


def parse_optimizer_config(optimizer_conf, config_arg_str=''):
config = config_parser.parse_config(optimizer_conf, config_arg_str)
return config.opt_config
config_parser.settings = copy.deepcopy(config_parser.DEFAULT_SETTING)
optimizer_conf()
opt_config = OptimizationConfig()
for k, v in config_parser.settings.iteritems():
if v is None:
continue
opt_config.__setattr__(k, v)
return opt_config


def reset_parser():
config_parser.begin_parse()
6 changes: 6 additions & 0 deletions python/paddle/trainer_config_helpers/layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,7 @@ def __init__(self,
assert size is not None
assert LayerType.is_layer_type(layer_type)
self.name = name
self.full_name = MakeLayerNameInSubmodel(name)
self.layer_type = layer_type
if parents is not None and type(parents) != list:
parents = [parents]
Expand Down Expand Up @@ -3489,6 +3490,11 @@ def map_in_links(x):

RecurrentLayerGroupEnd(name=name)

for layer_out in layer_outs:
# Thee previous full_name is the name is the rnn group
# We need a full_name outside the rnn group
layer_out.full_name = MakeLayerNameInSubmodel(layer_out.name)

if len(layer_outs) == 1:
return layer_outs[0]
else:
Expand Down
247 changes: 48 additions & 199 deletions python/paddle/v2/config_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,206 +14,55 @@

import collections
import re
from paddle.trainer_config_helpers.default_decorators import wrap_name_default
import paddle.trainer_config_helpers as conf_helps
from topology import Topology


class LayerType(type):
def __new__(cls, name, bases, attrs):
method_name = attrs.get('METHOD_NAME', None)
if method_name is not None:
method = getattr(conf_helps, method_name)
if method.__doc__ is not None:
mapper = attrs.get("__map_docstr__", None)
if mapper is not None:
attrs['__doc__'] = LayerType.__map_docstr__(
mapper(method.__doc__),
method_name=method_name,
name=name)
else:
attrs['__doc__'] = LayerType.__map_docstr__(
method.__doc__, method_name=method_name, name=name)
return super(LayerType, cls).__new__(cls, name, bases, attrs)

@staticmethod
def __map_docstr__(doc, name, method_name):
assert isinstance(doc, basestring)

# replace LayerOutput to paddle.v2.config_base.Layer
doc = doc.replace("LayerOutput", "paddle.v2.config_base.Layer")

doc = doc.replace('ParameterAttribute',
'paddle.v2.attr.ParameterAttribute')

doc = re.sub(r'ExtraLayerAttribute[^\s]?',
'paddle.v2.attr.ExtraAttribute', doc)

# xxx_layer to xxx
doc = re.sub(r"(?P<name>[a-z]+)_layer", r"\g<name>", doc)

# XxxxActivation to paddle.v2.Activation.Xxxx
doc = re.sub(r"(?P<name>[A-Z][a-zA-Z]+)Activation",
r"paddle.v2.Activation.\g<name>", doc)

# TODO(yuyang18): Add more rules if needed.

__layer_map__ = {}


def __map_docstr__(doc, name):
if doc is None:
return doc

assert isinstance(doc, basestring)

# replace LayerOutput to paddle.v2.config_base.Layer
doc = doc.replace("LayerOutput", "paddle.v2.config_base.Layer")

doc = doc.replace('ParameterAttribute', 'paddle.v2.attr.ParameterAttribute')

doc = re.sub(r'ExtraLayerAttribute[^\s]?', 'paddle.v2.attr.ExtraAttribute',
doc)

# xxx_layer to xxx
doc = re.sub(r"(?P<name>[a-z]+)_layer", r"\g<name>", doc)

# XxxxActivation to paddle.v2.Activation.Xxxx
doc = re.sub(r"(?P<name>[A-Z][a-zA-Z]+)Activation",
r"paddle.v2.Activation.\g<name>", doc)

# xxx_evaluator to paddle.v2.evaluator.xxx
doc = re.sub(r"(?P<name>[a-z]+)_evaluator", r"evaluator.\g<name>", doc)

# TODO(yuyang18): Add more rules if needed.
return doc


def __convert_to_v2__(f, name, module):
def wrapped(*args, **xargs):
out = f(*args, **xargs)
outs = out
if not isinstance(out, collections.Sequence):
outs = [out]
for l in outs:
if isinstance(l, conf_helps.LayerOutput):
__layer_map__[l.full_name] = l
return out

wrapped.__doc__ = __map_docstr__(f.__doc__, name)
wrapped.__name__ = name
wrapped.__module__ = module

return wrapped


class Layer(object):
__metaclass__ = LayerType

def __init__(self, name=None, parent_layers=None):
assert isinstance(parent_layers, dict)
self.name = name
self.__context__ = {}
self.__parent_layers__ = parent_layers
# some layer may have some extra parent layer
self.__extra_parent__ = []
# used for evaluator.
self.__children_layers__ = []

def extra_parent(self):
return self.__extra_parent__

def append_extra_parent(self, parent):
self.__extra_parent__.append(parent)

def append_child(self, layer, parent_names):
self.__children_layers__.append((layer, parent_names))

def to_proto(self, context):
"""
function to set proto attribute
"""
self.__context__ = context

# STEP: short cut if this layer is parsed before.
if self.context_name() in context:
if self.use_context_name():
return context[self.context_name()]
else:
return context[self.name]

# STEP: parse extra_parent that is not used by this layer but must
# be parsed before this layer.
for p in self.__extra_parent__:
p.to_proto(context=context)

# STEP: parse parent that is used by this layer, get the result and
# insert into kwargs of the next layer's to_proto_impl method.
kwargs = dict()
for layer_name in self.__parent_layers__:
if not isinstance(self.__parent_layers__[layer_name],
collections.Sequence):
v1_layer = self.__parent_layers__[layer_name].to_proto(
context=context)
else:
v1_layer = map(lambda x: x.to_proto(context=context),
self.__parent_layers__[layer_name])
kwargs[layer_name] = v1_layer

# STEP: parse myself and add myself into context.
ret_val = self.to_proto_impl(**kwargs)
if self.context_name() is not None \
and self.context_name() not in context:
context[self.context_name()] = ret_val

# STEP: parse children that should be pased after this layer.
for layer, pnames in self.__children_layers__:
drop = False

# child will only be parsed if all parents are in context.
for pname in pnames:
if pname not in context:
drop = True
break
if drop:
continue
layer.to_proto(context=context)

# STEP: return v1 layer result
if self.context_name() is None:
return ret_val
elif self.use_context_name():
return context[self.context_name()]
else:
return context[self.name]

def to_proto_impl(self, **kwargs):
raise NotImplementedError()

def context_name(self):
"""
Context name means the context which stores `to_proto_impl` result.
If multiple layer share same context_name, the `to_proto_impl` of them
will be invoked only once.
"""
return self.name

def use_context_name(self):
return False

def calculate_size(self):
"""
lazy calculate size of the layer, should be called when to_proto_impl of
this layer is called.
:return:
"""
return self.__context__[self.context_name()].size

def attr(self):
topo = Topology(self)
return topo.get_layer_proto(self.name)


def __convert_to_v2__(method_name,
parent_names,
is_default_name=True,
attach_parent=False):
if is_default_name:
wrapper = wrap_name_default(name_prefix=method_name)
else:
wrapper = None

class V2LayerImpl(Layer):
METHOD_NAME = method_name

def __init__(self, **kwargs):
parent_layers = dict()
other_kwargs = dict()
for pname in parent_names:
if pname in kwargs:
parent_layers[pname] = kwargs[pname]

if attach_parent:
pnames = [x.context_name() for x in parent_layers.values()]

for pname in parent_layers:
layers = kwargs[pname]
if not isinstance(layers, collections.Sequence):
layers = [layers]

for layer in layers:
layer.append_child(self, pnames)

for key in kwargs.keys():
if key not in parent_names:
other_kwargs[key] = kwargs[key]

name = kwargs.get('name', None)
super(V2LayerImpl, self).__init__(name, parent_layers)
self.__other_kwargs__ = other_kwargs

if wrapper is not None:
__init__ = wrapper(__init__)

def to_proto_impl(self, **kwargs):
args = dict()
for each in kwargs:
args[each] = kwargs[each]
for each in self.__other_kwargs__:
args[each] = self.__other_kwargs__[each]
return getattr(conf_helps, method_name)(**args)

return V2LayerImpl
Layer = conf_helps.LayerOutput
Loading