Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error after update into the latest version of the models #786

Open
pacelu opened this issue May 21, 2019 · 2 comments

Comments

@pacelu
Copy link

commented May 21, 2019

Ensuring input files exist [####################################] 100%
Checking for existing output [####################################] 100%
2019-05-21 09:43:57:rastervision.runner.local_experiment_runner: INFO - Saving command configuration to /root/data/test_out4/chip/potsdam-seg/command-config.json...
2019-05-21 09:43:57:rastervision.data.raster_source.geotiff_source: INFO - Loading GeoTiff files...
2019-05-21 09:43:57:rastervision.data.raster_source.geotiff_source: INFO - Loading GeoTiff files...
2019-05-21 09:43:57:rastervision.data.raster_source.geotiff_source: INFO - Loading GeoTiff files...
2019-05-21 09:43:57:rastervision.data.raster_source.geotiff_source: INFO - Loading GeoTiff files...
2019-05-21 09:43:57:rastervision.data.raster_source.geotiff_source: INFO - Loading GeoTiff files...
2019-05-21 09:43:57:rastervision.data.raster_source.geotiff_source: INFO - Loading GeoTiff files...
Making training chips...
2019-05-21 09:43:57:rastervision.task.task: INFO - Making train chips for scene: 01
2019-05-21 09:44:11:rastervision.backend.tf_deeplab: INFO - Creating TFRecord
/usr/local/lib/python3.6/dist-packages/pluginbase.py:439: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
fromlist, level)
2019-05-21 09:44:31:rastervision.task.task: INFO - Making train chips for scene: 02
2019-05-21 09:44:45:rastervision.backend.tf_deeplab: INFO - Creating TFRecord
2019-05-21 09:45:04:rastervision.task.task: INFO - Making validation chips for scene: 01
2019-05-21 09:45:17:rastervision.backend.tf_deeplab: INFO - Creating TFRecord
2019-05-21 09:45:37:rastervision.backend.tf_deeplab: INFO - Merging TFRecords
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/backend/tf_deeplab.py:78: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
tf.data.TFRecordDataset(path)
2019-05-21 09:45:37:rastervision.backend.tf_deeplab: INFO - 1000 records
2019-05-21 09:45:37:rastervision.backend.tf_deeplab: INFO - Merging TFRecords
2019-05-21 09:45:37:rastervision.backend.tf_deeplab: INFO - 500 records
2019-05-21 09:45:37:rastervision.runner.local_experiment_runner: INFO - Saving command configuration to /root/data/test_out4/train/potsdam-seg/command-config.json...
Training model...
2019-05-21 09:45:37:rastervision.backend.tf_deeplab: INFO - Setting up local input and output directories
2019-05-21 09:45:37:rastervision.backend.tf_deeplab: INFO - Downloading training data
2019-05-21 09:45:37:rastervision.backend.tf_deeplab: INFO - Downloading and untarring initial checkpoint
2019-05-21 09:45:38:rastervision.backend.tf_deeplab: INFO - tfdl_config=initialize_last_layer: "false"
last_layers_contain_logits_only: "false"
fine_tune_batch_norm: "true"
base_learning_rate: "0.001"
decoder_output_stride: 1
output_stride: 16
model_variant: "mobilenet_v2"
train_split: "train"
dataset: "custom"
train_batch_size: 8
training_number_of_steps: 5000
train_crop_size: 256
train_crop_size: 256
dl_custom_train: 11520
dl_custom_validation: 1728
save_interval_secs: 600
save_summaries_secs: 600
num_clones: 2

2019-05-21 09:45:38:rastervision.backend.tf_deeplab: INFO - Training steps=5000
2019-05-21 09:45:38:rastervision.backend.tf_deeplab: INFO - DL_CUSTOM_TRAIN=11520
2019-05-21 09:45:38:rastervision.backend.tf_deeplab: INFO - DL_CUSTOM_VALIDATION=1728
2019-05-21 09:45:38:rastervision.backend.tf_deeplab: INFO - DL_CUSTOM_CLASSES=8
2019-05-21 09:45:38:rastervision.backend.tf_deeplab: INFO - Starting training process
2019-05-21 09:45:38:rastervision.backend.tf_deeplab: INFO - Waiting for training and tensorboard processes
/usr/local/lib/python3.6/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

INFO:tensorflow:Training on train set
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py:423: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):
File "/opt/tf-models/deeplab/train.py", line 513, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/opt/tf-models/deeplab/train.py", line 463, in main
dataset.get_one_shot_iterator(), dataset.num_of_classes,
File "/usr/local/lib/python3.6/dist-packages/tensorflow/models-master/research/deeplab/datasets/data_generator.py", line 325, in get_one_shot_iterator
.map(self._preprocess_image, num_parallel_calls=self.num_readers))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 1584, in map
self, map_func, num_parallel_calls, preserve_cardinality=False))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 2771, in init
input_dataset, map_func, use_inter_op_parallelism, preserve_cardinality)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 2737, in init
map_func, self._transformation_name(), dataset=input_dataset)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 2124, in init
self._function.add_to_graph(ops.get_default_graph())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/function.py", line 490, in add_to_graph
self._create_definition_if_needed()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/function.py", line 341, in _create_definition_if_needed
self._create_definition_if_needed_impl()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/function.py", line 355, in _create_definition_if_needed_impl
whitelisted_stateful_ops=self._whitelisted_stateful_ops)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/function.py", line 883, in func_graph_from_py_func
outputs = func(*func_graph.inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 2099, in tf_data_structured_function_wrapper
ret = func(*nested_args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/models-master/research/deeplab/datasets/data_generator.py", line 287, in _preprocess_image
crop_width=self.crop_size[1],
IndexError: list index out of range
2019-05-21 09:45:40:rastervision.backend.tf_deeplab: INFO - Exporting frozen graph (/root/data/test_out4/train/potsdam-seg/model)
Traceback (most recent call last):
File "/usr/local/bin/rastervision", line 11, in
load_entry_point('rastervision==0.8.1', 'console_scripts', 'rastervision')()
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/cli/main.py", line 159, in run
dry_run=dry_run)
File "/usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/runner/experiment_runner.py", line 187, in run
self._run_experiment(command_dag)
File "/usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/runner/local_experiment_runner.py", line 35, in _run_experiment
run_commands(tmp_dir)
File "/usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/runner/local_experiment_runner.py", line 29, in run_commands
command.run(tmp_dir)
File "/usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/command/train_command.py", line 16, in run
self.task.train(tmp_dir)
File "/usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/task/task.py", line 136, in train
self.backend.train(tmp_dir)
File "/usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/backend/tf_deeplab.py", line 608, in train
num_classes, tfdl_config)
File "/usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/backend/tf_deeplab.py", line 382, in get_export_args
get_latest_checkpoint(train_logdir_local)))
File "/usr/local/lib/python3.6/dist-packages/rastervision-0.8.1-py3.6.egg/rastervision/backend/tf_deeplab.py", line 262, in get_latest_checkpoint
latest = sorted(zip(times, ckpts))[-1][1]
IndexError: list index out of range

The server configuration:
ubuntu 16.4
tensorflow 1.13.1
models: github The latest version
rastervision :0.8.1
,Because I see models updated deeplab on GPU parallel function
image
The old models
image
I feel the problem
image

@lossyrob

This comment has been minimized.

Copy link
Member

commented May 23, 2019

@pacelu are you using the the http://github.com/azavea/models fork, or the tensorflow/models repository? We aren't updating raster vision to keep in line with the tensorflow/models repository directly, and instead pinning tags to azavea/models as we release versions of Raster Vision. I'm wondering if there is an upstream breaking change that is causing this?

@pacelu

This comment has been minimized.

Copy link
Author

commented May 24, 2019

@pacelu are you using the the http://github.com/azavea/models fork, or the tensorflow/models repository? We aren't updating raster vision to keep in line with the tensorflow/models repository directly, and instead pinning tags to azavea/models as we release versions of Raster Vision. I'm wondering if there is an upstream breaking change that is causing this?
the tensorflow/models
Because I see models updated deeplab on GPU parallel function
Because now the version of the corresponding GPU training is too slow, slower than 1 a GPU
#777
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.