Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed global step update when the epoch is skipped ([#7677](https://github.com/PyTorchLightning/pytorch-lightning/pull/7677))


- Fixed dev debugger memory growing due to tracking events even when disabled ([#7875](https://github.com/PyTorchLightning/pytorch-lightning/pull/7875))


- Fixed training loop total batch counter when accumulate grad batches was enabled ([#7692](https://github.com/PyTorchLightning/pytorch-lightning/pull/7692))


Expand Down
1 change: 1 addition & 0 deletions pytorch_lightning/utilities/debugging.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ def __init__(self, trainer):
self.test_dataloader_calls = []
self.dataloader_sequence_calls = []

@enabled_only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this debugger used only for tests? should we look at removing this dependency and rebuilding event tracking later on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not even. It was used in the past for debugging, but we found out it was risky to rely on it for testing.
I am not sure of its utility right now and could be deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all the usages of the dev debugger in tests.

tests/callbacks/test_early_stopping.py:111:    assert len(trainer.dev_debugger.early_stopping_history) == expected_count
tests/callbacks/test_stochastic_weight_avg.py:100:            assert trainer.dev_debugger.count_events(
tests/plugins/test_rpc_sequential_plugin.py:46:        assert len(trainer.dev_debugger.pbar_added_metrics) > 0
tests/plugins/test_rpc_sequential_plugin.py:91:        assert len(trainer.dev_debugger.pbar_added_metrics) > 0
tests/models/test_amp.py:216:    assert trainer.dev_debugger.count_events('AMP') == 0
tests/models/test_amp.py:249:    assert trainer.dev_debugger.count_events('AMP') == 10
tests/checkpointing/test_model_checkpoint.py:143:    lr_scheduler_debug = trainer.dev_debugger.saved_lr_scheduler_updates
tests/checkpointing/test_model_checkpoint.py:249:    lr_scheduler_debug = trainer.dev_debugger.saved_lr_scheduler_updates
tests/checkpointing/test_model_checkpoint.py:867:    assert len(trainer.dev_debugger.checkpoint_callback_history) == 3
tests/checkpointing/test_model_checkpoint.py:954:    assert trainer.dev_debugger.checkpoint_callback_history[-1]['epoch'] == len(monitor) - 1
tests/checkpointing/test_checkpoint_callback_frequency.py:38:    assert len(trainer.dev_debugger.checkpoint_callback_history) == 0
tests/checkpointing/test_checkpoint_callback_frequency.py:47:    assert len(trainer.dev_debugger.checkpoint_callback_history) == 0
tests/trainer/test_dataloaders.py:527:        assert len(trainer.dev_debugger.num_seen_val_check_batches) == num_val_dataloaders
tests/trainer/test_dataloaders.py:528:        for dataloader_idx, num_batches in trainer.dev_debugger.num_seen_val_check_batches.items():
tests/trainer/test_dataloaders.py:532:        assert len(trainer.dev_debugger.num_seen_test_check_batches) == num_test_dataloaders
tests/trainer/test_dataloaders.py:533:        for dataloader_idx, num_batches in trainer.dev_debugger.num_seen_test_check_batches.items():
tests/trainer/test_dataloaders.py:594:        assert trainer.dev_debugger.num_seen_sanity_check_batches == trainer.num_sanity_val_steps * num_val_dataloaders
tests/trainer/test_dataloaders.py:1282:    assert len(trainer.dev_debugger.val_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1283:    assert len(trainer.dev_debugger.test_dataloader_calls) == 0
tests/trainer/test_dataloaders.py:1284:    assert len(trainer.dev_debugger.train_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1287:    calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1315:    assert len(trainer.dev_debugger.val_dataloader_calls) == 10
tests/trainer/test_dataloaders.py:1316:    assert len(trainer.dev_debugger.test_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1317:    assert len(trainer.dev_debugger.train_dataloader_calls) == 3
tests/trainer/test_dataloaders.py:1320:    calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1357:    assert len(trainer.dev_debugger.val_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1358:    assert len(trainer.dev_debugger.test_dataloader_calls) == 0
tests/trainer/test_dataloaders.py:1359:    assert len(trainer.dev_debugger.train_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1362:    calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1389:    assert len(trainer.dev_debugger.val_dataloader_calls) == 4
tests/trainer/test_dataloaders.py:1390:    assert len(trainer.dev_debugger.train_dataloader_calls) == 3
tests/trainer/test_dataloaders.py:1391:    assert len(trainer.dev_debugger.test_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1394:    calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1438:    assert len(trainer.dev_debugger.val_dataloader_calls) == 4
tests/trainer/test_dataloaders.py:1439:    assert len(trainer.dev_debugger.train_dataloader_calls) == 3
tests/trainer/test_dataloaders.py:1440:    assert len(trainer.dev_debugger.test_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1443:    calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1490:    assert len(trainer.dev_debugger.val_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1491:    assert len(trainer.dev_debugger.test_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1492:    assert len(trainer.dev_debugger.train_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1495:    calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/flags/test_fast_dev_run.py:102:        assert len(trainer.dev_debugger.checkpoint_callback_history) == 0
tests/trainer/flags/test_fast_dev_run.py:106:        assert len(trainer.dev_debugger.early_stopping_history) == 0
tests/trainer/optimization/test_manual_optimization.py:98:    assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:130:    assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:161:    assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:187:    assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:225:    assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:502:    assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:577:    assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * 2
tests/trainer/optimization/test_manual_optimization.py:636:    assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * 2

Most could be replaced with progress tracking. Others could have the test changed to track manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, dev debugger should not exist at all. we want to rely on the existing built in, well tested unit testing framework.

def track_event(
self,
evt_type: str,
Expand Down