Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PlaceholderIds in Actions and Plans (instead of full Placeholders) #3223

Merged
merged 18 commits into from
Mar 25, 2020

Conversation

Jasopaum
Copy link
Member

Introduce a class that could be used as id instead of the current string | int.

The rationale behind that was to disambiguate id and string | int in messages and actions.
For instance, we could have only placeholder ids (as ObjectIds) in Plan's actions instead of the whole placeholders and we wouldn't take them for string or integers.

@Jasopaum Jasopaum requested a review from a team as a code owner March 18, 2020 22:32
worker._tmp_placeholders[tensor_id] = tensor

return worker._tmp_placeholders[tensor_id]
return PlaceHolder(owner=worker, id=tensor_id, tags=tags, description=description)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you make sure you don't create 2 PlaceHolder with the same tags?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't, but I don't think it was done before neither

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check to avoid duplicate placeholders moves after the deserialization. Since Actions no longer directly contain Placeholders (which move to a field on Plan), we don't have to worry about deserializing the same Placeholder multiple times as long as the property on Plan doesn't contain duplicates.

"""
if isinstance(obj, (tuple, list)):
r = [self.replace_with_placeholders(o, **kw) for o in obj]
r = [self.replace_with_placeholder_ids(o, **kw) for o in obj]
Copy link
Contributor

@LaRiffle LaRiffle Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the strategy: you're now storing ObjectIds instead of placeholders right? What are the benefits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You remember the self.worker._tmp_placeholders variable that held placeholders during deserialization in order to avoid creating multiple identical Placeholders, because duplicate Placeholders breaks instantiation with actual tensors? This is a strategy for getting rid of that temp variable, by storing Placeholders at the Plan level and using that store to avoid duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ids here makes it possible to deserialize without creating duplicate Placeholders and then index into the Placeholder store on the Plan to make sure that all the places that reference the same id use the same Placeholder object.

# Replace non-instanciated placeholders from plan.placeholders by instanciated placeholders
# from state.state_placeholders
# TODO this definitely isn't the right strategy. Maybe state shouldn't contain
# instanciated placeholders but values directly?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda confused by this part. Not sure I have my head fully wrapped around why State contains Placeholders. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked Théo who told me that it was to avoid tensor-placeholder operations.
If state contains placeholders, then we can do operation between the plan's placeholders and the ones in its state.
But I agree, Placeholders shouldn't have anything to do with States as they're static

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm okay with this strategy for now. Based on Slack conversations, sounds like we should leave State containing instantiated Placeholders for the time being.

@Jasopaum Jasopaum changed the title [WIP] Start experimenting a ObjectId class Start experimenting with an ObjectId class Mar 21, 2020
@codecov
Copy link

codecov bot commented Mar 21, 2020

Codecov Report

Merging #3223 into master will increase coverage by 0.04%.
The diff coverage is 96.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3223      +/-   ##
==========================================
+ Coverage   94.54%   94.59%   +0.04%     
==========================================
  Files         146      147       +1     
  Lines       15897    15954      +57     
==========================================
+ Hits        15030    15091      +61     
+ Misses        867      863       -4     
Impacted Files Coverage Δ
syft/execution/computation.py 93.33% <84.21%> (-1.12%) ⬇️
syft/execution/placeholder_id.py 96.42% <96.42%> (ø)
syft/execution/placeholder.py 91.22% <100.00%> (-0.96%) ⬇️
syft/execution/plan.py 92.79% <100.00%> (+2.91%) ⬆️
syft/execution/state.py 91.22% <100.00%> (+0.07%) ⬆️
syft/serde/msgpack/serde.py 95.94% <100.00%> (+0.02%) ⬆️
syft/serde/protobuf/proto.py 100.00% <100.00%> (ø)
syft/serde/protobuf/serde.py 63.81% <100.00%> (+1.23%) ⬆️
test/serde/msgpack/test_msgpack_serde_full.py 100.00% <100.00%> (ø)
test/serde/protobuf/test_protobuf_serde_full.py 100.00% <100.00%> (ø)
... and 8 more

@karlhigley karlhigley added plans Type: Refactor 🔨 A complete overhaul of a file, feature, or codebase labels Mar 22, 2020
Copy link
Member

@iamtrask iamtrask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am too stretched thin to do a blocking review today.

Copy link
Contributor

@karlhigley karlhigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure that changes are required, but left a few comments for discussion and re-blocking this PR after premature approval.

syft/execution/computation.py Outdated Show resolved Hide resolved
syft/execution/computation.py Outdated Show resolved Hide resolved
@@ -20,6 +21,10 @@ def __init__(self, owner=None, id=None, tags: set = None, description: str = Non
"""
super().__init__(id=id, owner=owner, tags=tags, description=description)

# TODO currently we only use ObjectId on placeholder but if we use them, we
# should use them everywhere
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObjectId might be too general as a name for this concept, given the current scope of use. Maybe we could call them PlaceholderIds for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping for a quick extension to all objects because I don't want to have Placeholder as a different object... But I can change the naming for now and change it back when we extend it to other objects.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still beneficial to know what type the id goes with somehow. Using a separate class allows us to put that information into type annotations, which is kinda nice. On the other hand, a proliferation of tiny id classes might not be great. 🤷‍♂

Could there be a base Id class that defines the fields/behaviors and then a bunch of sub-classes that only exist to capture/enforce id types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, I don't think Ids will have lots of functionalities so I'm not a fan of having a Id class for each PySyft object...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could kick this can down the road by calling it PlaceholderId for now. Since a one-to-one correspondence is required between Python classes and Protobuf schemas, that would require adding a schema for PlaceholderId though.

It's not perfect and we may just refactor it out later, but it keeps the scope of this PR limited and make it easier to think through the potential implications.

@@ -274,6 +275,20 @@ def replace_with_placeholders(self, obj, arg_ids, result_ids, **kw):
else:
return None

def replace_ids_with_placeholders(self, obj):
"""
Replace in an object all ids with Placeholders
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only applies to ObjectIds/PlaceholderIds now, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! All objects in Plan actions should be placeholders (except constants) and they are all encoded with an ObjectId

# Replace non-instanciated placeholders from plan.placeholders by instanciated placeholders
# from state.state_placeholders
# TODO this definitely isn't the right strategy. Maybe state shouldn't contain
# instanciated placeholders but values directly?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm okay with this strategy for now. Based on Slack conversations, sounds like we should leave State containing instantiated Placeholders for the time being.

# Replace non-instanciated placeholders from plan.placeholders by instanciated placeholders
# from state.state_placeholders
# TODO this definitely isn't the right strategy. Maybe state shouldn't contain
# instanciated placeholders but values directly?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach is okay, but wonder if this makes sense to extract as a method since it happens in multiple places.

syft/execution/state.py Outdated Show resolved Hide resolved
@@ -56,6 +57,7 @@
torch.Size: SizePB,
# Syft types
AdditiveSharingTensor: AdditiveSharingTensorPB,
ObjectId: IdPB, # TODO harmonize naming between syft and proto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to do this without breaking everything would be to create an ObjectId/PlaceholderId schema on the Protobuf side that contains an Id field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How bad would it be to rename protobuf's Id to ObjectId/PlaceholderId?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda bad? It's not really intended for either of those specific purposes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if ObjectId was used everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were going to do that, we'd probably want to rename the Python class to Id, because I'm not sure everything that uses an Id is actually using it as the id for an object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Still seems like we might want a way to immediately tell what type an id goes with from the class though.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get why we would need to know which type the goes to 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons I can think of:

  • It's handy for serialization/deserialization.
  • It prevents people from misusing the ids from one class to load objects of another class (to prevent weird collisions of both accidental and intentional natures.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do you think serialization/deserialization could differ between different id classes?
  • Hum, I agree on this point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on whether they're represented by separate classes, I suppose. If they are, then seems like the serialization could be different.

Copy link
Contributor

@karlhigley karlhigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small nits, but this looks really really good. Thanks for taking this experiment and running with it! 😃

syft/execution/computation.py Show resolved Hide resolved
syft/execution/placeholder_id.py Outdated Show resolved Hide resolved
syft/execution/plan.py Show resolved Hide resolved
syft/serde/protobuf/proto.py Show resolved Hide resolved
@karlhigley karlhigley changed the title Start experimenting with an ObjectId class Use PlaceholderIds in Actions and Plans (instead of full Placeholders) Mar 24, 2020
@karlhigley karlhigley merged commit 1097935 into OpenMined:master Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Refactor 🔨 A complete overhaul of a file, feature, or codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants