Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add force_data_batch_size to ReplayPlugin for manual assignment of data-mem ratio #834

Merged
merged 86 commits into from Jan 7, 2022

Conversation

HamedHemati
Copy link
Collaborator

No description provided.

@ContinualAI-bot
Copy link
Collaborator

Oh no! It seems there are some PEP8 errors! 😕
Don't worry, you can fix them! 💪
Here's a report about the errors and where you can find them:

avalanche/benchmarks/utils/data_loader.py:265:81: E501 line too long (83 > 80 characters)
1       E501 line too long (83 > 80 characters)

@coveralls
Copy link

coveralls commented Nov 19, 2021

Pull Request Test Coverage Report for Build 1482890049

  • 1 of 5 (20.0%) changed or added relevant lines in 2 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.02%) to 79.921%

Changes Missing Coverage Covered Lines Changed/Added Lines %
avalanche/benchmarks/utils/data_loader.py 0 4 0.0%
Files with Coverage Reduction New Missed Lines %
avalanche/benchmarks/utils/data_loader.py 1 89.68%
avalanche/benchmarks/scenarios/generic_definitions.py 2 85.37%
Totals Coverage Status
Change from base Build 1481778571: -0.02%
Covered Lines: 10962
Relevant Lines: 13716

💛 - Coveralls

@AntonioCarta
Copy link
Collaborator

I see that GroupBalancedDataLoader and ReplayDataLoader have different behaviors regarding the batch_size. The first one creates num_groups dataloaders of batch_size samples, obtaining a final mini-batch of size num_groups * batch_size. The second one uses batch_size as the final size. I think we should make the consistent. Either GroupBalancedDataLoader uses batch_size as the total batch size, or ReplayDataLoader creates a minibatch of size 2 * batch_size. Maybe the first option is slightly more intuitive.

What do you think?

@HamedHemati
Copy link
Collaborator Author

What do you think?

@AntonioCarta I agree with you. I think keeping the batch size equal to a specified batch_size defined by the user makes it more intuitive because that's what one would normally expect (fixed instead of growing). Therefore, I think it's better to set the batch size in GroupBalancedDataLoader equal to batch_size / num_groups for each group; and obviously it should raise an error when batch_size < num_groups.

@HamedHemati HamedHemati changed the title Add force_data_batch_size to ReplayPlugin for manual assignment of data-mem ratio Add force_data_batch_size to ReplayPlugin for manual assignment of data-mem ratio #833 Nov 26, 2021
@HamedHemati HamedHemati changed the title Add force_data_batch_size to ReplayPlugin for manual assignment of data-mem ratio #833 Add force_data_batch_size to ReplayPlugin for manual assignment of data-mem ratio Nov 26, 2021
@HamedHemati
Copy link
Collaborator Author

Changes are applied to GroupBalancedDataloader to have a fixed batch size. It just checks whether batch_size > len(datasets) to ensure that each dataset gets at least one slot.

AndreaCossu and others added 28 commits December 21, 2021 16:45
Support for ReduceLROnPlateau scheduler
GitBook: fixed a bug in documentation
Prevent TensorboardLogger from blocking the process exit
Fixing issue#771: concat_datasets_sequentially discards classes
@AntonioCarta
Copy link
Collaborator

@HamedHemati I fixed the test that was not passing so that I can merge the PR.

@AntonioCarta AntonioCarta merged commit 8abc551 into ContinualAI:master Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants