-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PlaceholderIds
in Actions
and Plans
(instead of full Placeholders
)
#3223
Conversation
worker._tmp_placeholders[tensor_id] = tensor | ||
|
||
return worker._tmp_placeholders[tensor_id] | ||
return PlaceHolder(owner=worker, id=tensor_id, tags=tags, description=description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you make sure you don't create 2 PlaceHolder with the same tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't, but I don't think it was done before neither
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check to avoid duplicate placeholders moves after the deserialization. Since Actions
no longer directly contain Placeholders
(which move to a field on Plan
), we don't have to worry about deserializing the same Placeholder
multiple times as long as the property on Plan
doesn't contain duplicates.
syft/execution/plan.py
Outdated
""" | ||
if isinstance(obj, (tuple, list)): | ||
r = [self.replace_with_placeholders(o, **kw) for o in obj] | ||
r = [self.replace_with_placeholder_ids(o, **kw) for o in obj] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the strategy: you're now storing ObjectIds instead of placeholders right? What are the benefits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You remember the self.worker._tmp_placeholders
variable that held placeholders during deserialization in order to avoid creating multiple identical Placeholders
, because duplicate Placeholders
breaks instantiation with actual tensors? This is a strategy for getting rid of that temp variable, by storing Placeholders
at the Plan
level and using that store to avoid duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ids here makes it possible to deserialize without creating duplicate Placeholders
and then index into the Placeholder
store on the Plan
to make sure that all the places that reference the same id use the same Placeholder
object.
syft/execution/plan.py
Outdated
# Replace non-instanciated placeholders from plan.placeholders by instanciated placeholders | ||
# from state.state_placeholders | ||
# TODO this definitely isn't the right strategy. Maybe state shouldn't contain | ||
# instanciated placeholders but values directly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kinda confused by this part. Not sure I have my head fully wrapped around why State contains Placeholders. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked Théo who told me that it was to avoid tensor-placeholder operations.
If state contains placeholders, then we can do operation between the plan's placeholders and the ones in its state.
But I agree, Placeholders shouldn't have anything to do with States as they're static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm okay with this strategy for now. Based on Slack conversations, sounds like we should leave State
containing instantiated Placeholders
for the time being.
f0a01d9
to
da9ed69
Compare
Codecov Report
@@ Coverage Diff @@
## master #3223 +/- ##
==========================================
+ Coverage 94.54% 94.59% +0.04%
==========================================
Files 146 147 +1
Lines 15897 15954 +57
==========================================
+ Hits 15030 15091 +61
+ Misses 867 863 -4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am too stretched thin to do a blocking review today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure that changes are required, but left a few comments for discussion and re-blocking this PR after premature approval.
syft/execution/placeholder.py
Outdated
@@ -20,6 +21,10 @@ def __init__(self, owner=None, id=None, tags: set = None, description: str = Non | |||
""" | |||
super().__init__(id=id, owner=owner, tags=tags, description=description) | |||
|
|||
# TODO currently we only use ObjectId on placeholder but if we use them, we | |||
# should use them everywhere |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ObjectId
might be too general as a name for this concept, given the current scope of use. Maybe we could call them PlaceholderIds
for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping for a quick extension to all objects because I don't want to have Placeholder as a different object... But I can change the naming for now and change it back when we extend it to other objects.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still beneficial to know what type the id goes with somehow. Using a separate class allows us to put that information into type annotations, which is kinda nice. On the other hand, a proliferation of tiny id classes might not be great. 🤷♂
Could there be a base Id
class that defines the fields/behaviors and then a bunch of sub-classes that only exist to capture/enforce id types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, I don't think Ids
will have lots of functionalities so I'm not a fan of having a Id class for each PySyft object...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could kick this can down the road by calling it PlaceholderId
for now. Since a one-to-one correspondence is required between Python classes and Protobuf schemas, that would require adding a schema for PlaceholderId
though.
It's not perfect and we may just refactor it out later, but it keeps the scope of this PR limited and make it easier to think through the potential implications.
@@ -274,6 +275,20 @@ def replace_with_placeholders(self, obj, arg_ids, result_ids, **kw): | |||
else: | |||
return None | |||
|
|||
def replace_ids_with_placeholders(self, obj): | |||
""" | |||
Replace in an object all ids with Placeholders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only applies to ObjectIds
/PlaceholderIds
now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! All objects in Plan actions should be placeholders (except constants) and they are all encoded with an ObjectId
syft/execution/plan.py
Outdated
# Replace non-instanciated placeholders from plan.placeholders by instanciated placeholders | ||
# from state.state_placeholders | ||
# TODO this definitely isn't the right strategy. Maybe state shouldn't contain | ||
# instanciated placeholders but values directly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm okay with this strategy for now. Based on Slack conversations, sounds like we should leave State
containing instantiated Placeholders
for the time being.
syft/execution/plan.py
Outdated
# Replace non-instanciated placeholders from plan.placeholders by instanciated placeholders | ||
# from state.state_placeholders | ||
# TODO this definitely isn't the right strategy. Maybe state shouldn't contain | ||
# instanciated placeholders but values directly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the approach is okay, but wonder if this makes sense to extract as a method since it happens in multiple places.
syft/serde/protobuf/proto.py
Outdated
@@ -56,6 +57,7 @@ | |||
torch.Size: SizePB, | |||
# Syft types | |||
AdditiveSharingTensor: AdditiveSharingTensorPB, | |||
ObjectId: IdPB, # TODO harmonize naming between syft and proto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to do this without breaking everything would be to create an ObjectId
/PlaceholderId
schema on the Protobuf side that contains an Id
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How bad would it be to rename protobuf's Id
to ObjectId
/PlaceholderId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda bad? It's not really intended for either of those specific purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if ObjectId was used everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were going to do that, we'd probably want to rename the Python class to Id
, because I'm not sure everything that uses an Id
is actually using it as the id for an object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Still seems like we might want a way to immediately tell what type an id goes with from the class though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why we would need to know which type the goes to 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reasons I can think of:
- It's handy for serialization/deserialization.
- It prevents people from misusing the ids from one class to load objects of another class (to prevent weird collisions of both accidental and intentional natures.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do you think serialization/deserialization could differ between different id classes?
- Hum, I agree on this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on whether they're represented by separate classes, I suppose. If they are, then seems like the serialization could be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small nits, but this looks really really good. Thanks for taking this experiment and running with it! 😃
PlaceholderIds
in Actions
and Plans
(instead of full Placeholders
)
Introduce a class that could be used as id instead of the current string | int.
The rationale behind that was to disambiguate id and string | int in messages and actions.
For instance, we could have only placeholder ids (as ObjectIds) in Plan's actions instead of the whole placeholders and we wouldn't take them for string or integers.