drop-in replacements for cross_val_predict and cross_val_score etc #18

jrasero · 2022-02-25T22:35:29Z

Pradeep,

could something like this be of interest for the library?

The idea would be to create a class that would do fit and predict including deconfounding and the use of the estimator in an encapsulated way.

Below is a skeleton example. This would only deconfound the input data.

cross_val_predict and cross_val_score functions could as well be implemented.

from sklearn.base import clone

class SklearnWrapper():

    def __init__(self,
                 deconfounder,
                 estimator):

        self.deconfounder = deconfounder
        self.estimator = estimator

    def fit(self,
            input_data,
            target_data,
            confounders,
            sample_weight=None):

        # clone input arguments
        deconfounder = clone(self.deconfounder)
        estimator = clone(self.estimator)

        # Deconfound input data
        deconf_input = deconfounder.fit_transform(input_data, confounders)
        self.deconfounder_ = deconfounder

        # Fit deconfounded input data
        estimator.fit(deconf_input, target_data, sample_weight)
        self.estimator_ = estimator

        return self

    def predict(self,
                input_data,
                confounders):

        deconf_input = self.deconfounder_.transform(input_data, confounders)

        return self.estimator_.predict(deconf_input)

The text was updated successfully, but these errors were encountered:

raamana · 2022-02-26T15:04:17Z

Great suggestion Javier, let me think. This may make it more syntactically convenient or easier for new users, esp. regarding example 1 here: https://raamana.github.io/confounds/usage.html

Important question is how do we handle more complicated use-cases e.g. to do advanced cross-validation! If you can quickly sketch an outline how such a thing can be adapted for cross_val_predict and cross_val_score, it would be easier to decide on it. It's a tradeoff between how much encapsulate (plug&play) and how much we stick to few small modular blocks.

jrasero · 2022-02-28T14:58:49Z

Sure.

I think we could more or less "borrow" (i.e. copy) cross_val_predict and cross_val_score original sklearn implementations and just rewrite the auxiliary functions _fit_and_score and _fit_and_predict that they respectively use in each fold iteration.

This could be below a skeleton example for cross_val_predict. Note that it takes the same arguments as the original sklearn implementation plus an argument for the confounders. Here I required this new argument to be passed by key, and not by position as X and y. I think this would avoid passing the confounders as X or y by mistake.

def cross_val_predict(
    estimator,
    X,
    y=None,
    *,
    confounds,  # Mandatory, and better (IMHO) if needed to be passed by key.
    groups=None,
    cv=None,
    n_jobs=None,
    verbose=0,
    fit_params=None,
    pre_dispatch="2*n_jobs",
    method="predict",
):
    #########################
    # Here would probably come  #
    # all initial checks                    #
    #########################

    ##########################
    # Probably some original code #
    # from sklearn                           #
    ##########################

    parallel=Parallel(n_jobs=n_jobs, verbose=verbose,
                      pre_dispatch=pre_dispatch)
  
    # Here we call there our implementation of _fit_and_predict. 
    # I preprended the word deconf to show that we also deconfound the data before fitting.
    predictions=parallel(
        delayed(_deconf_fit_and_predict)(
            clone(estimator), X, y, confounds,
            train, test, verbose, fit_params, method
        )
        for train, test in splits
    )

    ####################
    # Prepraring prediction #
    # for output                   #
    ####################

    return predictions


def _deconf_fit_and_predict(estimator,
                            X,
                            y,
                            C, # Confounders
                            train,
                            test,
                            verbose,
                            fit_params,
                            method):
    
    # Split into training and test sets
    X_train, y_train, C_train = X[train], y[train], C[train]
    X_test, y_test, C_test = X[test], y[test], C[test]

    # N.B. estimator should be our sklearn wrapper. We should require this 
    # during the initial checks of cross_val_predict
    estimator.fit(X_train, y_train, confounders=C_train)
    predictions= estimator.predict(X_test, confounders=C_test)

    ####################
    # Probably some code #
    ####################


    return predictions

raamana · 2022-03-07T16:39:28Z

this is a great idea Javier! Instead of leaving it to the users to the right thing, we can provide the necessary wrappers to the most common use cases, and do the right thing. Please go ahead and do it, with one suggestion to name these clearly different from the sklearn counterparts to avoid any confusion.. something like deconfounded_cross_validation or something like that

jrasero · 2022-03-07T23:39:19Z

Sure, that makes sense.

I will start working on this this week.

jrasero · 2022-03-11T15:29:32Z

Ok, so thinking about all of this this past few days, I think we need to at least implement the following objects:

A deconfouded Estimator. We could name it "DeconfEstimator". This could cover in principle any task (regression/classification) and its main methods would be fit, fit_predict and predict.
A deconfouded Transformer? We could name it DeconfTransformer. Its main methods here would be fit, fit_transform and transform.
A deconfounded cross_val_predict. We could name it "deconfounded_cv_predict" and the implementation would be based on the example above.
A deconfounded cross_val_score. We could name it "deconfounded_cv_score". For this and following the same rationale of sklearn, this would just call a function that we will create with the name "deconfounded_cross_validate" (in sklearn they call "cross_validate")
A deconfounded optimization object, something like a "DeconfGridSearchCV".

All but the last implementation should be simple. About this last one, it's a shame, because I don't see a quick way of leveraging the original scikit-learn's GridSearchCV class to save coding (e.g. by inheriting from it), so maybe the best would be just to copy as much code as possible from the original GridSearchCV class and adapt it to our purposes.

Finally, I anticipate that there will be a case that may give us some trouble in the future. In principle, in "DeconfEstimator" the estimator to pass could be a pipeline object. So first the data would be deconfounded and then they will go through the pipeline. The problem is when the pipeline contains an imputer operation to deal with NaN's in the data. In that case, unless confounds enables to omit the NaNs when deconfounding, I guess it will give an error. Anyway, we can come to this case in the future.

I'll start working on this. Follow-up here (https://github.com/jrasero/confounds/tree/sklearn_wrapper)

raamana · 2022-03-11T18:13:04Z

Thanks Javier. Most of our existing classes like Residualize() are supposed to be sklearn estimators already to the extent possible (what you refer to as A Deconfouded Estimator). I spent a lot of time trying to get them to pass sklearn tests but I realized their test suite has many issues and fundamental limitations, so let's not waste time with that. I don't follow the need for Transformer as our existing deconfouders are already transformers, right? I can see a clear need for cross_val_predict and GridSearchCV but perhaps I am missing something, so let's discuss this in more detail before you invest too much time into this.

jrasero · 2022-03-11T18:40:37Z

Yes, yes, the deconfounders will always be transformers, but I was talking about the part that comes after these. For example, a PCA would be a transformer and has a different API from a classifier or regressor object.

But I guess for now implementing this kind of objects could secondary. I agree with you that cross_val_predict and GridSearchCV are the most important pieces right now.

raamana · 2022-03-11T22:06:20Z

perhaps we could consider allowing users to pass an sklearn pipeline object (for preprocessing) prior to deconfouding and before the prediction estimator is applied. Let's start with the simpler case, and based on how it turns out, we can slowly add more useful features. We certainly dont want to recreate all of sklearn, and we want to do things that they can't or won't do.

raamana · 2022-06-09T18:39:25Z

HI @jrasero , would you be participating in the OHBM BrainHack virtually? Let me know. I was thinking picking up few of the ideas/pending issues here, and working on them during the hackathon/conf.

raamana · 2022-06-16T09:09:26Z

Hi @jrasero, let me know when you have sometime today, so we can discuss where you are at, and how we can get this finished this hackathon?

jrasero · 2022-06-16T14:02:07Z

Hey @raamana, I am free now. Let me reach you via email to maybe do a quick zoom meeting?

jrasero · 2022-06-16T14:22:39Z

Here is the branch I created several months ago for this issue, and its status as of today:

https://github.com/jrasero/confounds/blob/sklearn_wrapper/confounds/sklearn.py

Once finished, I'll do the pull-request

jrasero · 2022-06-17T04:03:38Z

Ok @raamana , I finally implemented a few things today. A DeconfEstimator class, which first deconfounds the data and then runs a passed estimator, a deconfounded_cv_predict function to get predictions in a cross-validation scheme including deconfounding, and deconfounded_cv_score, the same but yielding the performance scores.

I've also added a few tests to these funcionalities.

Please, take a look, and let me know if you see these OK. I can pull request all of this if you want.

raamana · 2022-06-17T14:25:05Z

Fantastic. Please send a PR when you are ready, Javi!

raamana changed the title ~~sklearn wrapper~~ drop-in replacements for cross_val_predict and cross_val_score etc Jun 16, 2022

raamana mentioned this issue Jun 16, 2022

confounds: deconfounding library to properly handle confounds ohbm/hackathon2022#90

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop-in replacements for cross_val_predict and cross_val_score etc #18

drop-in replacements for cross_val_predict and cross_val_score etc #18

jrasero commented Feb 25, 2022

raamana commented Feb 26, 2022

jrasero commented Feb 28, 2022

raamana commented Mar 7, 2022

jrasero commented Mar 7, 2022

jrasero commented Mar 11, 2022

raamana commented Mar 11, 2022

jrasero commented Mar 11, 2022

raamana commented Mar 11, 2022

raamana commented Jun 9, 2022

raamana commented Jun 16, 2022

jrasero commented Jun 16, 2022

jrasero commented Jun 16, 2022

jrasero commented Jun 17, 2022

raamana commented Jun 17, 2022

drop-in replacements for cross_val_predict and cross_val_score etc #18

drop-in replacements for cross_val_predict and cross_val_score etc #18

Comments

jrasero commented Feb 25, 2022

raamana commented Feb 26, 2022

jrasero commented Feb 28, 2022

raamana commented Mar 7, 2022

jrasero commented Mar 7, 2022

jrasero commented Mar 11, 2022

raamana commented Mar 11, 2022

jrasero commented Mar 11, 2022

raamana commented Mar 11, 2022

raamana commented Jun 9, 2022

raamana commented Jun 16, 2022

jrasero commented Jun 16, 2022

jrasero commented Jun 16, 2022

jrasero commented Jun 17, 2022

raamana commented Jun 17, 2022