Raise error when adabert finetune #11

baiyfbupt · 2020-10-19T07:01:47Z

Python: 3.6
tensorflow: 1.12.3

https://github.com/alibaba/EasyTransfer/tree/master/scripts/knowledge_distillation#33-finetune%E5%B9%B6%E9%A2%84%E6%B5%8B

error message:

Parameters:
  arch_l2_reg=0.001
  arch_opt_lr=0.0001
  arch_path=./adabert_models/search/best/arch.json
  checkpointDir=
  config=None
  distribution_strategy=None
  emb_pathes=./adabert_models/search/best/wemb.npy,./adabert_models/search/best/pemb.npy
  embed_size=128
  f=
  h=False
  help=False
  helpfull=False
  helpshort=False
  is_pair_task=1
  is_training=True
  job_name=worker
  loss_beta=4.0
  loss_gamma=0.8
  max_save=1
  mode=None
  modelZooBasePath=/root/.eztransfer_modelzoo
  model_dir=./adabert_models/finetune/
  model_l2_reg=0.0003
  model_opt_lr=5e-06
  num_classes=2
  num_core_per_host=1
  num_token=30522
  open_ess=None
  outputs=None
  save_steps=30
  searched_model=./adabert_models/search/best
  seq_length=128
  tables=None
  task_index=0
  temp_decay_steps=18000
  train_batch_size=32
  train_file=./mrpc/train_mrpc_output_logits.txt,./mrpc/dev_mrpc_output_logits.txt
  train_steps=30
  usePAI=False
  workerCPU=1
  workerCount=1
  workerGPU=1
  worker_hosts=localhost:5001

INFO:tensorflow:Using config: {'_model_dir': './adabert_models/finetune/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 30, '_save_checkpoints_secs': None, '_session_config': intra_op_parallelism_threads: 64
inter_op_parallelism_threads: 64
gpu_options {
  per_process_gpu_memory_fraction: 1.0
  allow_growth: true
  force_gpu_compatible: true
}
allow_soft_placement: true
graph_options {
  rewrite_options {
    constant_folding: OFF
  }
}
, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb74ca82518>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function get_model_fn.<locals>.model_fn at 0x7fb747b5ca60>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:num_parallel_batches 8
INFO:tensorflow:shuffle_buffer_size 1024
INFO:tensorflow:prefetch_buffer_size 1
INFO:tensorflow:batch_size 32
INFO:tensorflow:distribution_strategy None
INFO:tensorflow:num_micro_batches 1
INFO:tensorflow:input_schema labels:int:1,ids:int:128,mask:int:128,seg_ids:int:128,prob_logits:float:26
INFO:tensorflow:./mrpc/train_mrpc_output_logits.txt, total number of training examples 3668
INFO:tensorflow:num_parallel_batches 8
INFO:tensorflow:shuffle_buffer_size 1024
INFO:tensorflow:prefetch_buffer_size 1
INFO:tensorflow:batch_size 32
INFO:tensorflow:distribution_strategy None
INFO:tensorflow:num_micro_batches 1
INFO:tensorflow:input_schema labels:int:1,ids:int:128,mask:int:128,seg_ids:int:128,prob_logits:float:26
INFO:tensorflow:./mrpc/dev_mrpc_output_logits.txt, total number of eval examples 408
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 30 or save_checkpoints_secs None.
INFO:tensorflow:Calling model_fn.
====== 1 cells ======
searched op distributions ======>
{(0, 2): array([0.09165421, 0.10016984, 0.09765881, 0.10739392, 0.11919089,
       0.11196633, 0.0886544 , 0.0918505 , 0.09910965, 0.09235148],
      dtype=float32), (1, 2): array([0.09706013, 0.09356507, 0.0981027 , 0.12304117, 0.11524444,
       0.11587463, 0.08445919, 0.08487304, 0.09398084, 0.09379876],
      dtype=float32), (0, 3): array([0.09431954, 0.09327073, 0.09237881, 0.11603428, 0.11155295,
       0.10067917, 0.08867473, 0.09511623, 0.10352677, 0.10444669],
      dtype=float32), (1, 3): array([0.10270569, 0.09890872, 0.09799318, 0.11776961, 0.1114557 ,
       0.10589372, 0.08784777, 0.08607294, 0.09638216, 0.09497037],
      dtype=float32), (2, 3): array([0.10986389, 0.10315704, 0.09444612, 0.10787328, 0.10679101,
       0.1009643 , 0.09225149, 0.09034712, 0.09656148, 0.09774421],
      dtype=float32), (0, 4): array([0.09107016, 0.08650941, 0.08797061, 0.11874099, 0.10631699,
       0.11541508, 0.09348091, 0.10446716, 0.10441516, 0.09161358],
      dtype=float32), (1, 4): array([0.09745023, 0.09436949, 0.08907194, 0.12406871, 0.12098379,
       0.10303614, 0.08979508, 0.0915589 , 0.0945749 , 0.09509072],
      dtype=float32), (2, 4): array([0.10559002, 0.09695413, 0.09311736, 0.10958813, 0.10113393,
       0.10502651, 0.10361147, 0.08967911, 0.09141986, 0.10387935],
      dtype=float32), (3, 4): array([0.10880414, 0.10283509, 0.10102597, 0.1103332 , 0.10920084,
       0.09756382, 0.087904  , 0.09060738, 0.09158863, 0.10013673],
      dtype=float32)}
derived arch ======>
{(1, 2): 3, (0, 2): 4, (1, 3): 3, (0, 3): 3, (1, 4): 3, (0, 4): 3}

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 983, in _TensorTensorConversionFunction
    (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int64: 'Tensor("c0/node2/edge0to2/Reshape_1:0", shape=(1, 1, 1, 1), dtype=int64)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main_adabert.py", line 322, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "main_adabert.py", line 280, in main
    estimator, train_spec=train_spec, eval_spec=eval_spec)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
    return executor.run()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 610, in run
    return self.run_local()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1237, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "main_adabert.py", line 183, in model_fn
    given_arch=given_arch)
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 117, in __init__
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 270, in _build_graph
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 471, in build_cell
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 501, in build_node
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 561, in build_edge
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 866, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1131, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5042, in mul
    "Mul", x=x, y=y, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 546, in _apply_op_helper
    inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Mul' Op has type int64 that does not match type float32 of argument 'x'.

ScarletPan · 2020-10-19T08:48:57Z

Please use python2.7 + tensorflow1.12 to avoid this error.

baiyfbupt closed this as completed Oct 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise error when adabert finetune #11

Raise error when adabert finetune #11

baiyfbupt commented Oct 19, 2020 •

edited

ScarletPan commented Oct 19, 2020

Raise error when adabert finetune #11

Raise error when adabert finetune #11

Comments

baiyfbupt commented Oct 19, 2020 • edited

ScarletPan commented Oct 19, 2020

baiyfbupt commented Oct 19, 2020 •

edited