Training stops before epoch 0 after loading best.torch #218

cvKDean · 2019-09-25T10:44:00Z

Good day,

I am trying to train on my own dataset as with the case with issue #215. I opted to load the weights from the crowdAI dataset trained model and then continue training on my own images from there.

Using issue #160 as reference, I loaded the weights from best.torch.
(btw, is it correct to use self.load('.../experiments/mapping_challenge_baseline/checkpoints/unet/best.torch')?)
I also set self._initializar _model_weights = None'.

However it threw out an error: ‘module’ object has no attribute ‘_rebuild_tensor_v2’
Which I was able to fix via this thread.

Another error occurred:
And I fixed it via this thread.

Now, running python main.py train --pipeline_name unet_weighted does not throw any more errors, but training seems to not start at all (no prints of epoch 0).
Here is the full printout of the console:

/home/USER/Developer/anaconda3/envs/mapping/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)
/home/USER/Developer/ML/open-solution-mapping-challenge/src/utils.py:132: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
SHOW-325
https://ui.neptune.ml/shared/showroom/e/SHOW-325
2019-09-26 01-18-48 mapping-challenge >>> training
2019-09-26 01-18-54 steps >>> step xy_train adapting inputs
2019-09-26 01-18-54 steps >>> step xy_train transforming...
2019-09-26 01-18-54 steps >>> step xy_inference adapting inputs
2019-09-26 01-18-54 steps >>> step xy_inference transforming...
2019-09-26 01-18-54 steps >>> step loader adapting inputs
2019-09-26 01-18-54 steps >>> step loader transforming...
2019-09-26 01-18-54 steps >>> step unet unpacking inputs
2019-09-26 01-18-54 steps >>> step unet loading transformer...
2019-09-26 01-18-55 steps >>> step unet transforming...
2019-09-26 01-18-58 steps >>> step mask_resize adapting inputs
2019-09-26 01-18-58 steps >>> step mask_resize transforming...
100%|##########| 16/16 [00:01<00:00,  8.38it/s]
2019-09-26 01-18-59 steps >>> step mask_resize caching outputs...
2019-09-26 01-18-59 steps >>> step category_mapper adapting inputs
2019-09-26 01-18-59 steps >>> step category_mapper transforming...
100%|##########| 16/16 [00:00<00:00, 1761.53it/s]
2019-09-26 01-19-00 steps >>> step mask_erosion adapting inputs
2019-09-26 01-19-00 steps >>> step mask_erosion transforming...
100%|##########| 16/16 [00:00<00:00, 136956.87it/s]
2019-09-26 01-19-00 steps >>> step labeler adapting inputs
2019-09-26 01-19-00 steps >>> step labeler transforming...
100%|##########| 16/16 [00:00<00:00, 132.53it/s]
2019-09-26 01-19-00 steps >>> step mask_dilation adapting inputs
2019-09-26 01-19-00 steps >>> step mask_dilation transforming...
100%|##########| 16/16 [00:00<00:00, 92.15it/s]
2019-09-26 01-19-00 steps >>> step mask_resize loading output...
2019-09-26 01-19-00 steps >>> step score_builder adapting inputs
2019-09-26 01-19-00 steps >>> step score_builder transforming...
100%|##########| 16/16 [00:00<00:00, 18.44it/s]
2019-09-26 01-19-01 steps >>> step output adapting inputs
2019-09-26 01-19-01 steps >>> step output transforming...
(mapping) USER@debian:~/Developer/ML/open-solution-mapping-challenge$

No errors are reported but the training does not seem to start. Do you have any ideas for why this is the case? Thank you.

The text was updated successfully, but these errors were encountered:

zeciro · 2021-03-24T04:16:58Z

Hi,
I encountered the same issue.
"python main.py train --pipeline_name unet" works fine, but "unet_weighted" does not start training.

Have you found a fix?

data-overload · 2021-03-25T15:38:52Z

I'm also trying to train my images on either of the contest weights (unet or scoring_model) but I'm not sure how to load those weights such that the training continues on them.

@zeciro running that command with either unet or unet_weighted gives basically the same console output as the OP shows (no errors, but no output seems to be produced).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training stops before epoch 0 after loading best.torch #218

Training stops before epoch 0 after loading best.torch #218

cvKDean commented Sep 25, 2019

zeciro commented Mar 24, 2021

data-overload commented Mar 25, 2021

Training stops before epoch 0 after loading best.torch #218

Training stops before epoch 0 after loading best.torch #218

Comments

cvKDean commented Sep 25, 2019

zeciro commented Mar 24, 2021

data-overload commented Mar 25, 2021