Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training stops before epoch 0 after loading best.torch #218

Open
cvKDean opened this issue Sep 25, 2019 · 2 comments
Open

Training stops before epoch 0 after loading best.torch #218

cvKDean opened this issue Sep 25, 2019 · 2 comments

Comments

@cvKDean
Copy link

cvKDean commented Sep 25, 2019

Good day,

I am trying to train on my own dataset as with the case with issue #215. I opted to load the weights from the crowdAI dataset trained model and then continue training on my own images from there.

Using issue #160 as reference, I loaded the weights from best.torch.
(btw, is it correct to use self.load('.../experiments/mapping_challenge_baseline/checkpoints/unet/best.torch')?)
I also set self._initializar _model_weights = None'.

However it threw out an error: ‘module’ object has no attribute ‘_rebuild_tensor_v2’
Which I was able to fix via this thread.

Another error occurred:
And I fixed it via this thread.

Now, running python main.py train --pipeline_name unet_weighted does not throw any more errors, but training seems to not start at all (no prints of epoch 0).
Here is the full printout of the console:

/home/USER/Developer/anaconda3/envs/mapping/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)
/home/USER/Developer/ML/open-solution-mapping-challenge/src/utils.py:132: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
SHOW-325
https://ui.neptune.ml/shared/showroom/e/SHOW-325
2019-09-26 01-18-48 mapping-challenge >>> training
2019-09-26 01-18-54 steps >>> step xy_train adapting inputs
2019-09-26 01-18-54 steps >>> step xy_train transforming...
2019-09-26 01-18-54 steps >>> step xy_inference adapting inputs
2019-09-26 01-18-54 steps >>> step xy_inference transforming...
2019-09-26 01-18-54 steps >>> step loader adapting inputs
2019-09-26 01-18-54 steps >>> step loader transforming...
2019-09-26 01-18-54 steps >>> step unet unpacking inputs
2019-09-26 01-18-54 steps >>> step unet loading transformer...
2019-09-26 01-18-55 steps >>> step unet transforming...
2019-09-26 01-18-58 steps >>> step mask_resize adapting inputs
2019-09-26 01-18-58 steps >>> step mask_resize transforming...
100%|##########| 16/16 [00:01<00:00,  8.38it/s]
2019-09-26 01-18-59 steps >>> step mask_resize caching outputs...
2019-09-26 01-18-59 steps >>> step category_mapper adapting inputs
2019-09-26 01-18-59 steps >>> step category_mapper transforming...
100%|##########| 16/16 [00:00<00:00, 1761.53it/s]
2019-09-26 01-19-00 steps >>> step mask_erosion adapting inputs
2019-09-26 01-19-00 steps >>> step mask_erosion transforming...
100%|##########| 16/16 [00:00<00:00, 136956.87it/s]
2019-09-26 01-19-00 steps >>> step labeler adapting inputs
2019-09-26 01-19-00 steps >>> step labeler transforming...
100%|##########| 16/16 [00:00<00:00, 132.53it/s]
2019-09-26 01-19-00 steps >>> step mask_dilation adapting inputs
2019-09-26 01-19-00 steps >>> step mask_dilation transforming...
100%|##########| 16/16 [00:00<00:00, 92.15it/s]
2019-09-26 01-19-00 steps >>> step mask_resize loading output...
2019-09-26 01-19-00 steps >>> step score_builder adapting inputs
2019-09-26 01-19-00 steps >>> step score_builder transforming...
100%|##########| 16/16 [00:00<00:00, 18.44it/s]
2019-09-26 01-19-01 steps >>> step output adapting inputs
2019-09-26 01-19-01 steps >>> step output transforming...
(mapping) USER@debian:~/Developer/ML/open-solution-mapping-challenge$

No errors are reported but the training does not seem to start. Do you have any ideas for why this is the case? Thank you.

@zeciro
Copy link

zeciro commented Mar 24, 2021

Hi,
I encountered the same issue.
"python main.py train --pipeline_name unet" works fine, but "unet_weighted" does not start training.

Have you found a fix?

@data-overload
Copy link

I'm also trying to train my images on either of the contest weights (unet or scoring_model) but I'm not sure how to load those weights such that the training continues on them.

@zeciro running that command with either unet or unet_weighted gives basically the same console output as the OP shows (no errors, but no output seems to be produced).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants