-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a parameter to batch all garbage collection calls every n seconds #3805
Conversation
syft/workers/base.py
Outdated
@@ -119,6 +121,7 @@ def __init__( | |||
self.auto_add = auto_add | |||
self._message_pending_time = message_pending_time | |||
self.msg_history = [] | |||
self.trash = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Wouldn't this be a better attribute for the storage
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea!
syft/__init__.py
Outdated
@@ -140,3 +140,6 @@ def pool(): | |||
from syft.generic.id_provider import IdProvider | |||
|
|||
ID_PROVIDER = IdProvider() | |||
|
|||
# Garbage colect all remote data on a worker every garbage_delay seconds | |||
garbage_delay = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Wouldn't this be a better attribute for the storage
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! However, I left a small remark regarding the delay, I also think that the size of the batched objects also counts, for example if you have a huge number of messages to send but didn't reach the delay yet, you may want to send to free some memory in the remote worker
syft/workers/base.py
Outdated
|
||
trash[location.id][1].append(object_id) | ||
|
||
if (time.time() - trash[location.id][0]) > delay: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current delay = 0
, this will always be the case and won't do any batching I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a default delay = 0 to have the gc tests passing.
Yeah I could add an extra argument with the maximum capacity of the trash!
Codecov Report
@@ Coverage Diff @@
## master #3805 +/- ##
==========================================
+ Coverage 94.70% 95.15% +0.44%
==========================================
Files 187 186 -1
Lines 18946 18909 -37
==========================================
+ Hits 17943 17992 +49
+ Misses 1003 917 -86
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
pip-dep/requirements.txt
Outdated
@@ -12,7 +12,7 @@ psutil==5.7.0 | |||
requests~=2.22.0 | |||
requests-toolbelt==0.9.1 | |||
scipy~=1.4.1 | |||
syft-proto==0.4.9 | |||
git+https://github.com/openmined/syft-proto.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to talk with @vvmnnnkv to do a new release?
y_ptr = y.send(bob) | ||
del y_ptr | ||
|
||
assert x.id not in bob.object_store._objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Maybe we can also check y
? (y
should be deleted right?)
max_size = self.object_store.trash_capacity | ||
trash = self.object_store.trash | ||
|
||
if location.id not in trash: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PICKY: Make trash
a defaultdict
.
trash = defaultdict(lambda: (time.time(), []))
Whenever you do not have a key and you want to access it, it will automatically create that tuple
Description
This will help reducing messages to remotely GC values.
How has this been tested?
Checklist