-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drop-in replacements for cross_val_predict and cross_val_score etc #18
Comments
Great suggestion Javier, let me think. This may make it more syntactically convenient or easier for new users, esp. regarding example 1 here: https://raamana.github.io/confounds/usage.html Important question is how do we handle more complicated use-cases e.g. to do advanced cross-validation! If you can quickly sketch an outline how such a thing can be adapted for cross_val_predict and cross_val_score, it would be easier to decide on it. It's a tradeoff between how much encapsulate (plug&play) and how much we stick to few small modular blocks. |
Sure. I think we could more or less "borrow" (i.e. copy) cross_val_predict and cross_val_score original sklearn implementations and just rewrite the auxiliary functions _fit_and_score and _fit_and_predict that they respectively use in each fold iteration. This could be below a skeleton example for cross_val_predict. Note that it takes the same arguments as the original sklearn implementation plus an argument for the confounders. Here I required this new argument to be passed by key, and not by position as X and y. I think this would avoid passing the confounders as X or y by mistake.
|
this is a great idea Javier! Instead of leaving it to the users to the right thing, we can provide the necessary wrappers to the most common use cases, and do the right thing. Please go ahead and do it, with one suggestion to name these clearly different from the sklearn counterparts to avoid any confusion.. something like |
Sure, that makes sense. I will start working on this this week. |
Ok, so thinking about all of this this past few days, I think we need to at least implement the following objects:
All but the last implementation should be simple. About this last one, it's a shame, because I don't see a quick way of leveraging the original scikit-learn's GridSearchCV class to save coding (e.g. by inheriting from it), so maybe the best would be just to copy as much code as possible from the original GridSearchCV class and adapt it to our purposes. Finally, I anticipate that there will be a case that may give us some trouble in the future. In principle, in "DeconfEstimator" the estimator to pass could be a pipeline object. So first the data would be deconfounded and then they will go through the pipeline. The problem is when the pipeline contains an imputer operation to deal with NaN's in the data. In that case, unless confounds enables to omit the NaNs when deconfounding, I guess it will give an error. Anyway, we can come to this case in the future. I'll start working on this. Follow-up here (https://github.com/jrasero/confounds/tree/sklearn_wrapper) |
Thanks Javier. Most of our existing classes like Residualize() are supposed to be sklearn estimators already to the extent possible (what you refer to as A Deconfouded Estimator). I spent a lot of time trying to get them to pass sklearn tests but I realized their test suite has many issues and fundamental limitations, so let's not waste time with that. I don't follow the need for Transformer as our existing deconfouders are already transformers, right? I can see a clear need for cross_val_predict and GridSearchCV but perhaps I am missing something, so let's discuss this in more detail before you invest too much time into this. |
Yes, yes, the deconfounders will always be transformers, but I was talking about the part that comes after these. For example, a PCA would be a transformer and has a different API from a classifier or regressor object. But I guess for now implementing this kind of objects could secondary. I agree with you that cross_val_predict and GridSearchCV are the most important pieces right now. |
perhaps we could consider allowing users to pass an sklearn pipeline object (for preprocessing) prior to deconfouding and before the prediction estimator is applied. Let's start with the simpler case, and based on how it turns out, we can slowly add more useful features. We certainly dont want to recreate all of sklearn, and we want to do things that they can't or won't do. |
HI @jrasero , would you be participating in the OHBM BrainHack virtually? Let me know. I was thinking picking up few of the ideas/pending issues here, and working on them during the hackathon/conf. |
Hi @jrasero, let me know when you have sometime today, so we can discuss where you are at, and how we can get this finished this hackathon? |
Hey @raamana, I am free now. Let me reach you via email to maybe do a quick zoom meeting? |
Here is the branch I created several months ago for this issue, and its status as of today: https://github.com/jrasero/confounds/blob/sklearn_wrapper/confounds/sklearn.py Once finished, I'll do the pull-request |
Ok @raamana , I finally implemented a few things today. A I've also added a few tests to these funcionalities. Please, take a look, and let me know if you see these OK. I can pull request all of this if you want. |
Fantastic. Please send a PR when you are ready, Javi! |
Pradeep,
could something like this be of interest for the library?
The idea would be to create a class that would do fit and predict including deconfounding and the use of the estimator in an encapsulated way.
Below is a skeleton example. This would only deconfound the input data.
cross_val_predict and cross_val_score functions could as well be implemented.
The text was updated successfully, but these errors were encountered: