You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, we will train a baseline model, then we will restore the parameters of the baseline model, continue to train. When we restore parameters, our code is as follows.
vars_to_warm_start = ['^((?!Adam)(?!pos_dense).)*$']
variables = self.restore_variables()
restorer = tf.compat.v1.train.Saver(var_list=variables, max_to_keep=1)
restorer.restore(session, base_checkpoint_path)
saver= tf.compat.v1.train.Saver(max_to_keep=1)
def restore_variables(self):
list_of_vars = None
if 'vars_to_warm_start' in _Hyperparams:
vars_to_warm_start = _Hyperparams['vars_to_warm_start']
if isinstance(vars_to_warm_start, str) or vars_to_warm_start is None:
# Both vars_to_warm_start = '.*' and vars_to_warm_start = None will match
# everything (in TRAINABLE_VARIABLES) here.
self.logger.info("Warm-starting variables only in GLOBAL_VARIABLES.")
list_of_vars = ops.get_collection(
ops.GraphKeys.GLOBAL_VARIABLES, scope=vars_to_warm_start)
self.logger.info('Loading base model variables: {}'.format(list_of_vars))
saveable_objects = tf.get_collection(tf.GraphKeys.SAVEABLE_OBJECTS,
scope=vars_to_warm_start)
self.logger.info('Loading saveable variables: {}'.format(saveable_objects))
list_of_vars += saveable_objects
elif isinstance(vars_to_warm_start, list):
if all(isinstance(v, str) for v in vars_to_warm_start):
self.logger.info("Warm-starting partial variables in GLOBAL_VARIABLES.")
list_of_vars = []
saveable_objects = []
for v in vars_to_warm_start:
list_of_vars += ops.get_collection(
ops.GraphKeys.GLOBAL_VARIABLES, scope=v)
saveable_objects += tf.get_collection(tf.GraphKeys.SAVEABLE_OBJECTS,
scope=v)
self.logger.info('Loading base model variables: {}'.format(list_of_vars))
self.logger.info('Loading saveable variables: {}'.format(saveable_objects))
list_of_vars += saveable_objects
return list_of_vars
We enable GlobalStepEvict for imei feature at two stage.
If we enable GlobalStepEvict when restoring the baseline model, it will failed when saving checkpoint via saver. The core dump info is:
tensorflow::SaveV2::Compute (this=0x7f8fd20bdec0, context=<optimized out>) at
tensorflow/core/kernels/save_restore_v2_ops.cc:177
tensor_name = "feature_processing/imei_embedding/embedding_weights/Adam"
It seems that there exists a problem when saving the Adam parameters.
If we only resotre tf.trainable_variables(), it saved checkpoint successfully. It failed when restore tf.global_variables() where including Adam parameters.
If we disable GlobalStepEvict when restoring the baseline model, it will run normally, but loss, AUC will be poor.
The text was updated successfully, but these errors were encountered:
First, we will train a baseline model, then we will restore the parameters of the baseline model, continue to train. When we restore parameters, our code is as follows.
We enable GlobalStepEvict for imei feature at two stage.
If we enable GlobalStepEvict when restoring the baseline model, it will failed when saving checkpoint via saver. The core dump info is:
It seems that there exists a problem when saving the Adam parameters.
If we only resotre tf.trainable_variables(), it saved checkpoint successfully. It failed when restore tf.global_variables() where including Adam parameters.
If we disable GlobalStepEvict when restoring the baseline model, it will run normally, but loss, AUC will be poor.
The text was updated successfully, but these errors were encountered: