Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a parameter to batch all garbage collection calls every n seconds #3805

Merged
merged 10 commits into from
Jul 15, 2020

Conversation

LaRiffle
Copy link
Contributor

@LaRiffle LaRiffle commented Jul 1, 2020

Description

This will help reducing messages to remotely GC values.

How has this been tested?

  • Describe the tests that you ran to verify your changes.
  • Provide instructions so we can reproduce.
  • List any relevant details for your test configuration.

Checklist

@LaRiffle LaRiffle requested a review from a team as a code owner July 1, 2020 10:02
@LaRiffle LaRiffle requested review from youben11 and gmuraru July 1, 2020 10:12
@LaRiffle LaRiffle added the Type: New Feature ➕ Introduction of a completely new addition to the codebase label Jul 1, 2020
@@ -119,6 +121,7 @@ def __init__(
self.auto_add = auto_add
self._message_pending_time = message_pending_time
self.msg_history = []
self.trash = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Wouldn't this be a better attribute for the storage class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

syft/__init__.py Outdated
@@ -140,3 +140,6 @@ def pool():
from syft.generic.id_provider import IdProvider

ID_PROVIDER = IdProvider()

# Garbage colect all remote data on a worker every garbage_delay seconds
garbage_delay = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Wouldn't this be a better attribute for the storage class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@LaRiffle LaRiffle requested a review from gmuraru July 8, 2020 12:20
Copy link
Member

@youben11 youben11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! However, I left a small remark regarding the delay, I also think that the size of the batched objects also counts, for example if you have a huge number of messages to send but didn't reach the delay yet, you may want to send to free some memory in the remote worker


trash[location.id][1].append(object_id)

if (time.time() - trash[location.id][0]) > delay:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current delay = 0, this will always be the case and won't do any batching I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a default delay = 0 to have the gc tests passing.
Yeah I could add an extra argument with the maximum capacity of the trash!

@codecov
Copy link

codecov bot commented Jul 14, 2020

Codecov Report

Merging #3805 into master will increase coverage by 0.44%.
The diff coverage is 96.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3805      +/-   ##
==========================================
+ Coverage   94.70%   95.15%   +0.44%     
==========================================
  Files         187      186       -1     
  Lines       18946    18909      -37     
==========================================
+ Hits        17943    17992      +49     
+ Misses       1003      917      -86     
Impacted Files Coverage Δ
syft/messaging/message.py 92.69% <86.66%> (-0.20%) ⬇️
syft/generic/object_storage.py 97.36% <100.00%> (+0.10%) ⬆️
syft/generic/pointers/object_pointer.py 82.08% <100.00%> (-0.14%) ⬇️
syft/workers/base.py 91.42% <100.00%> (+0.37%) ⬆️
syft/workers/message_handler.py 91.71% <100.00%> (+0.04%) ⬆️
test/generic/test_gc.py 100.00% <100.00%> (ø)
test/serde/serde_helpers.py 100.00% <100.00%> (ø)
...frameworks/torch/tensors/interpreters/gradients.py 94.19% <0.00%> (-0.04%) ⬇️
...ft/frameworks/torch/tensors/interpreters/native.py 91.25% <0.00%> (-0.03%) ⬇️
test/execution/test_translation.py 100.00% <0.00%> (ø)
... and 7 more

Copy link
Member

@youben11 youben11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@gmuraru gmuraru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -12,7 +12,7 @@ psutil==5.7.0
requests~=2.22.0
requests-toolbelt==0.9.1
scipy~=1.4.1
syft-proto==0.4.9
git+https://github.com/openmined/syft-proto.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to talk with @vvmnnnkv to do a new release?

y_ptr = y.send(bob)
del y_ptr

assert x.id not in bob.object_store._objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Maybe we can also check y? (y should be deleted right?)

max_size = self.object_store.trash_capacity
trash = self.object_store.trash

if location.id not in trash:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PICKY: Make trash a defaultdict.
trash = defaultdict(lambda: (time.time(), []))
Whenever you do not have a key and you want to access it, it will automatically create that tuple

@LaRiffle LaRiffle merged commit e5d9a27 into master Jul 15, 2020
@delete-merged-branch delete-merged-branch bot deleted the ryffel/batch-garbage-collect branch July 15, 2020 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: New Feature ➕ Introduction of a completely new addition to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants