Skip to content

Trainer doesn't finish at last step. #60

@roperi

Description

@roperi

Version (branch): data_backend

Once trainer finishes going through all the total steps, it continues with even more steps.

8/27/2023 13:50:25 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:25 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:25 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:25 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:25 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:25 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:25 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:25 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
Steps: : 624it [2:35:21, 23.78s/it, lr=1.5e-6, step_loss=0.19]  08/27/2023 13:50:28 - DEBUG - __main__ - Starting into epoch: 98
08/27/2023 13:50:28 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:28 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:28 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:28 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:28 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:28 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:28 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:28 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:29 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:29 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:29 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:29 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:29 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:29 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:29 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:29 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
Steps: : 624it [2:35:25, 23.78s/it, lr=1.5e-6, step_loss=0.115]08/27/2023 13:50:32 - DEBUG - __main__ - Starting into epoch: 99
08/27/2023 13:50:32 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:32 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:32 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:32 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:32 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:32 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:32 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:32 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:33 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:33 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:33 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:33 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:33 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:33 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)
08/27/2023 13:50:33 - DEBUG - MultiaspectImage - Image size before rotation: (768, 768)
08/27/2023 13:50:33 - DEBUG - MultiaspectImage - Image size after rotation: (768, 768)

In fact, trainer continues beyond number of NUM_EPOCHS and even does validation samples every VALIDATION_STEPS. I set my MAX_NUM_STEPS to a incredible high number. I wonder if it's is restarting all over with MAX_NUM_STEPS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    1 / 0 magicnot reliably reproduciblebugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions