New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functional api/identify effect #640
Conversation
Signed-off-by: Andres Morales <andresmor@microsoft.com>
Signed-off-by: Andres Morales <andresmor@microsoft.com>
Signed-off-by: Andres Morales <andresmor@microsoft.com>
dd4b31d
to
6d1a464
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andresmor-ms! Overall, this looks great! One change is that I don't think that frontdoor and IV identification should be in a class called backdoor identifier. I see that used to be called CausalIdentifier. I agree that that name is too generic. Let's either rename the backdoor class to something in-between (will think about what might be appropriate); or track an issue to refactor frontdoor and IV out into their own classes.
Signed-off-by: Andres Morales <andresmor@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start. Thanks for adding this @andresmor-ms.
I've added a few comments in my initial review.
docs/source/example_notebooks/dowhy_efficient_backdoor_example.ipynb
Outdated
Show resolved
Hide resolved
Signed-off-by: Andres Morales <andresmor@microsoft.com>
Signed-off-by: Andres Morales <andresmor@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I just added a few comments to improve readability of the notebook.
Post that, we can merge this in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andresmor-ms My apologies for being a bit late in the game here. I missed this one.
I have a very general question: What is the idea behind having classes for identifiers opposed to, say, a simple function? E.g. auto_identify_effect(...)
Typically, we want to use objects, when we plan to give them a longer lifecycle. E.g. we want to pass them around where the consumer is not required to know what kind of identifier it is (IDIdentfier
or AutoIdentifier
). But if our typical usage is
identifier = AutoIdentifier(...)
identifier.identify_effect()
this can be simplified by providing a single call. As an added benefit, a lot less bookkeeping of variables is necessary (in the implementation of the class).
So yea, I'm curious about the usage patterns you have in mind here.
I can see now that the identifiers are actually used in If this is true, I'm wondering: should @amit-sharma That's probably not just a comment for @andresmor-ms, but also a more general question. What do you think? |
No worries. The intention of not refactoring this as functions (and keeping the classes) is mainly to keep backwards compatibility with the object-oriented API, I agree that having functions like As for usage patterns I believe that what we want to achieve is having a single point where you can configure everything (the import dowhy
dowhy.identify_effect(
graph=graph,
treatment=treatment_name,
outcome=outcome_name,
method=AutoIdentifier(...),
) Just as a note, per my understanding what Peter suggest is to use it like this: import dowhy
dowhy.auto_identify_effect(
graph=graph,
treatment=treatment_name,
outcome=outcome_name,
estimand_type=...,
adjustment_method=...,
) (We could still have a general @amit-sharma, @emrekiciman, I'd also like to know your opinion on this. |
Backwards compatibility is important for a while -- we haven't decided how long but I would think at least 6-12 months. Yes, general usage would be per below. For the unfamiliar user, the identify_effect function could call AutoIdentifier as a default. I.e., like this:
Conceptually, I see the identify_effect() function signature developing into a function that accepts arguments that are related to the causal question being asked: treatment, outcome, whether we care about direct/indirect effects, CATE, etc. In contrast, the specific identification methods (AutoIdentifier, IDIdentifier, ...) may also accept configuration arguments (how hard to search, whether to return a maximal or minimal conditioning set) that are more about the identification method rather than the causal question being asked. I expect we'll have more PRs to address this, though. Right now, would be good to keep this PR focused on the functional refactoring? |
@petergtz said:
Peter, I think deprecating the classes is where want to go. I'd probably hold off on marking them as deprecated at least until we complete the functional refactoring of the whole API. I might also suggest not marking them as deprecated until the new API is no longer marked "experimental". I.e., I don't think we want a user to have to choose between using "deprecated" vs "experimental" APIs :D Emre |
@emrekiciman Yes, that's certainly true :-). Also agree with the rest of what you said. It seems like what I'd like to see is that we don't want new library features built heavily on things we plan to deprecate. And that's sometimes hard to keep in mind when plans are made, but their execution takes several months. So comments here would help. I believe we agreed, we would implement the "legacy" API using the new functional API. That's why I suggested to introduce Then we could add a comment on the classes saying that "we plan to deprecate them" and "new code should directly depend on I'm fine with the global function |
This is what I'm thinking to move forward:
What do you think? @emrekiciman @petergtz @amit-sharma |
Sounds great! Thanks, really appreciate your effort here, @andresmor-ms. |
Signed-off-by: Andres Morales <andresmor@microsoft.com>
I'd like to offer a different perspective. I agree on removing CausalModel class and moving to functional top-level user API because the CausalModel class was just a book-keeping class. For the AutoIdentifier/IDIdentifier classes, here's some reasons to keep them as classes (the same arguments may also apply to estimation classes
More generally, I am thinking about an ideal state where we provide a stable, user-facing API for causal tasks. At the same time, we make it super simple for people to add new underlying methods that support the stable API. In that direction, I feel having underlying methods as classes could be a useful abstraction. |
@amit-sharma I'd like to respectfully disagree with the motivation to have a class for simple code re-use. Using inheritance for code re-use has fallen out of favor for multiple reasons in software design. Why that is, has been explained in multiple other places. The Rust book also describes this in the chapter about OOP. Python allows it, but I wouldn't use the feature unnecessarily. And I think what applies for us, is, that requiring to inherit from
I think we discussed to introduce type hints for type-checking. That way we wouldn't require a wrapper for this task. Note that I'm not necessarily opposed to the global
That's indeed a scenario I could see for certain use cases. That's why I asked if we expect this usage pattern in this comment. Note though: we'd have to change the API of those classes nonetheless: right now,
Agree with estimators being classes. The usage is slightly different from identifiers though, and we take advantage of polymorphism (opposed to code re-use): by having an interface/Protocol for estimators we can opaquely pass an estimator we got from fitting the estimand to the |
I don't feel strongly about classes vs global functions, so won't say much about that. Looking at the current code, I do have a question about how we can be clear in the code about the official and supported API, vs. internal functions. For example, let's say we want So, this is maybe more of a python question. What's the way we keep (at least most of) these other functions private, hidden, or otherwise clearly outside of the supported API? |
@emrekiciman https://peps.python.org/pep-0008/#public-and-internal-interfaces provides some guidelines here:
Also:
And finally:
The latter recommendation is often a bit unpractical in my experience: when you prefix a module or function with |
I added the __all__ = [
"AutoIdentifier",
"auto_identify_effect",
"id_identify_effect",
"BackdoorAdjustment",
"EstimandType",
"IdentifiedEstimand",
"IDIdentifier",
"identify_effect",
] I didn't add it to the __all__ = [
"auto_identify_effect",
"id_identify_effect",
"EstimandType",
"IdentifiedEstimand",
"identify_effect",
] Another question would be if we want to remove the ones that we offer on So that the "official" import to get the functions would be: import dowhy
dowhy.auto_identify_effect(...) and not: import dowhy.causal_identifier
dowhy.causal_identifier.identify_effect(...) Or if we want both of them to be supported imports? |
Signed-off-by: Andres Morales <andresmor@microsoft.com>
c59924f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been a robust discussion. Given that the PR has been approved previously by everyone before @andresmor-ms added type hints, I'll go ahead and accept and merge.
Let's leave additional points (the remaining all public interface declarations, further refinement of functional vs class interfaces, demarcation of deprecated and experimental functionality, ...) as future PRs.
Sorry to ask a question on an old MR but I'm trying to understand some of the design direction. I'm having a bit of trouble understanding:
Its not clear to me why these couldn't be replaced by functions? Which code are you trying to maintain backwards compatibility with - users or internally? |
Hey @Padarn We only want to maintain backwards compatibility with the user API, so identifiers have already been replaced by functions. you can take a look at the main branch. E.g., the main identify_effect function. |
Ohh that is much more clear now. Thanks for the clarification @amit-sharma. |
First of a series of PR to add a functional API according to: https://github.com/py-why/dowhy/wiki/API-proposal-for-v1
identify_effect
to have a functional APIBackdoorIdentifier
class and extracted the logic fromCausalIdentifier
to be just a Protocolidentify_effect
method ofBackdoorIdentifier
andIDIdentifier
to take the graph as parameterenums
for easier type checking