# Trial-Unique Beta Maps Creation Processing Steps

## From the CIMAQ Memory Task (Image Encoding) fMRI Data - Within-Subject Level

### Goal: Feed Outputs (Beta Maps) as Features to a Within-Subject Nilearn Classifier

##### Input:

- Event files 
- Confounds (motion, etc) files
    - Generated by load_confound
- Preprocessed FMRIPrep data (4D .nii file)

**- Note: Data NOT Smoothed nor Denoised

##### Output: 

- 1 map (3D .nii file) of beta (regression) weights for each trial
- 1 concatenated 4D file of these 3D maps (trials ordered chronologically).

#### Version 1: Separate Model for Each Trial

- Trial of interest modelled as a separate condition (1 regressor)
- All other trials modelled in either the Encoding or Control condition (2 regressors)
   
**NOTE: Best of both models for enc/ctl trial classification**

#### Version 2: Separate model for Each Trial

- Trial of interest modelled as a separate condition (1 regressor)
- All other trials modelled as a single "other" condition (1 regressor)

Reference: How to derive beta maps for MVPA classification (Mumford et al., 2012):

https://www.sciencedirect.com/science/article/pii/S1053811911010081

#### Also creating contrasts per condition (to derive features for between-subject classification): 

 - Modeling enconding and control conditions across trials
     - 3 beta maps:
         - encoding (enc) , control (ctl), and encoding minus control (enc_minus_ctl)
 - Modeling control condition, as well as the encoding condition according to task performance:
    - miss and hit (post-scan image recognition performance)
    - 5 beta maps:
        - miss, hit hit_minus_miss, hit_minus_ctl, miss_minus_ctl
    - Modeling control condition & encoding condition according to task performance:
        - miss, wrong source, and correct source
    - 7 beta maps:
        - wrong_source, corr_source, cs_minus_ws, cs_minus_miss, ws_minus_miss, cs_minus_ctl, ws_minus_ctl
        

### Step 1: Load confound parameters

```
from load_confounds import Minimal

confounds = Minimal().load(FMRIPrep/preprocessed/fmri_img/file/path)
```

### From the load_confounds README document:

#### Note on low pass filtering

Common operation in resting-state fMRI analysis
- Featured in all preprocessing strategies of the Ciric et al. (2017) paper

fMRIprep does not output low pass filtering discrete cosines
- Can be implemented directly with the nilearn masker
    - ``low_pass`` argument 
**Specify the nilearn masker argument ``t_r`` if low_pass is used

#### Note on high pass filtering and detrending

Nilearn maskers & first-level model can remove slow time drifts & noise:
- ``high_pass`` & ``detrend`` arguments
- Both **redundant** with fMRIprep high_pass regressors
    - Both included in all load_confounds strategies
**Do NOT use nilearn's ``high_pass`` or ``detrend`` options with the default strategies.**

- A flexible ``Confounds`` loader can exclude fMRIprep high_pass noise components
    - Allows relying on nilearn's ``high_pass`` or ``detrending`` options
    **- NOT advised with compcor or ica_aroma analysis**

#### Note on demeaning confounds

**Confounds should be demeaned** (default load_confounds behaviour)
- Required to properly regress out confounds using nilearn
    - With the standardize=False, standardize=True or standardize="zscore" options
    - standardize="psc" requires turning off load_confounds demeaning option
    ```
    from load_confounds import Params6
    conf = Params6(demean=False)
    ```
    - Unless using nilearn maskers or first-level model ``detrend`` or ``high_pass`` options


### Step 2: create events variable & events.tsv file

#### From the 'sub-*_ses-V*_task-memory_events.tsv' file outputed by cimaq2bids.py 

Number of rows = number of trials

- First-level model uses trial onset times to match trial conditions to fMRI frames

Documentation:

https://nistats.github.io/auto_examples/04_low_level_functions/write_events_file.html#sphx-glr-auto-examples-04-low-level-functions-write-events-file-py

- Each encoding trial is modelled as a different condition (under trial_type column)
    - Modelled separately in the design matrix
        - Trial of interest has its own column in the design matrix
        - Other columns = other trials &  confound regressors
            - Modelled together as a single regressor

**Note: Some scans were cut short**
- The last few trials have NO associated brain activation frames
    - These need to be left out of the analysis
- MEMO: 310 frames = full scan, 288 frames = incomplete (~15 participants).

- "unscanned" trials need to be excluded from the model (about ~2-4 trials missing).

- E.g.:
    - 288*2.5 = 720s.
    - Trial #115 (out of 117) offset time ~ 710s
    - Trial #116 (out of 117) onset ~ 723s


### Step3 : Implement first-level model (implements regression in nilearn)

#### Generates contrasts and output maps of beta values (parameter estimators; one map of betas per trial).

**About first-level model:

Note 1: ``nilearn.glm.first_level_model`` provides an interface for ``nilearn.glm``

Note 2: Each encoding trial is modelled as a separate condition to obtain separate maps of beta values
- Model's output type to get betas = **effect sizes**

    - ``nilearn.glm.first_level.FisrtLevelModel.compute_contrast`` ``output_type`` parameter name

- Version A:
    - Control trials & encoding trials are modelled separately
        - 2 regressors
- Version B:
    - Control trials & encoding trials are modelled together
        - Single "other_trials" condition (1 regressor)

Note 3: the first_level_model can either be given its parameters in 2 ways:

1. Pre-constructed **design matrix**
    - 2-step method (chosen method here)
    - Built from the events and confounds files in a separate preparatory step
    - Takes precedence over the events and confounds parameters (method 2)
2. Events & confounds directly
    - 1-step method
    - Skipping the need to create a design matrix in a separate step
    - The model will generate the design matrix automatically


Nilearn links on design matrices:

https://nilearn.github.io/modules/generated/nilearn.plotting.plot_design_matrix.html#nilearn.plotting.plot_design_matrix

https://nilearn.github.io/modules/generated/nilearn.glm.first_level.make_first_level_design_matrix.html#nilearn.glm.first_level.make_first_level_design_matrix

##### Examples

- First-Level Model

https://nilearn.github.io/modules/generated/nilearn.glm.first_level.FirstLevelModel.html#nilearn.glm.first_level.FirstLevelModel

- Beta Map Extraction (Nistats)

https://github.com/poldracklab/fitlins/pull/48

**About contrasts and maps of beta values (parameter estimators):

- To access the estimated coefficients (betas of the GLM model)
    - We need to specify "canonical contrasts" (one per trial) isolating design matrix columns
        - Each contrast has a single 1 in its corresponding colum, and 0s for all the other columns


In [1]:
%matplotlib inline

import builtins
import glob
import itertools
import json
import more_itertools
import nibabel
import nilearn
import numpy as np
import os
import pandas as pd
import re
import scipy
import seaborn as sns
import sys
import tqdm
import typing
import warnings

from load_confounds import Minimal
from matplotlib import pyplot as plt
from nibabel.nifti1 import Nifti1Image
from os import PathLike
from pathlib import Path, PosixPath
from random import sample
from builtins import FutureWarning
warnings.filterwarnings(action='ignore', category=FutureWarning)
from nilearn.glm.first_level import FirstLevelModel, check_events
from nilearn.glm.first_level import make_first_level_design_matrix
from nilearn import image as nimage
from nilearn.input_data import MultiNiftiMasker, NiftiLabelsMasker
from nilearn.input_data import NiftiMapsMasker, NiftiMasker
from nilearn.input_data import NiftiSpheresMasker
from nilearn import plotting as niplot
from nilearn import image as nimage
from scipy.spatial import procrustes
from scipy.stats import pearsonr
from sklearn.cluster import FeatureAgglomeration
from sklearn.decomposition import PCA, FastICA
from sklearn.feature_selection import RFE, RFECV
from sklearn.feature_selection import SequentialFeatureSelector, VarianceThreshold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.metrics import make_scorer, recall_score, pairwise_distances
from sklearn.metrics import pairwise_distances
from sklearn.model_selection import cross_val_predict, cross_val_score
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold
from sklearn.model_selection import GridSearchCV, StratifiedShuffleSplit
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MaxAbsScaler, MinMaxScaler, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC, LinearSVC
from sklearn.utils import Bunch
from tqdm import tqdm as tqdm_
from typing import Iterable, Sequence, Union

from get_difumo import get_difumo
from get_difumo_cut_coords import get_difumo_cut_coords
from cimaq_decoding_pipeline import get_fmri_sessions, fetch_fmriprep_session, validate_model
from cimaq_decoding_pipeline import get_contrasts, get_all_contrasts, get_glm_events
from cimaq_decoding_utils import flatten, get_t_r, get_frame_times

sns.set(rc={'figure.figsize': (18,18)})

In [2]:
from sklearn.feature_selection import chi2
from sklearn.feature_selection import f_classif, f_regression
from sklearn.feature_selection import mutual_info_classif
from sklearn.feature_selection import mutual_info_regression
from sklearn.feature_selection import GenericUnivariateSelect


def GetSklearnPairwiseMetrics():
    while True:
        sklearn_func_names = ['cityblock', 'cosine', 'euclidean',
                              'l1', 'l2', 'manhattan']
        sklearn_dict = sklearn.metrics.pairwise.distance_metrics()
        yield pd.Series(itemgetter(*sklearn_func_names)(sklearn_dict),
                        index=sklearn_func_names, name='sklearn_funcs')
        

def GetScipyPairwiseMetrics():
    while True:
        scipy_func_names = ['braycurtis','canberra', 'chebyshev',
                            'correlation', 'dice', 'hamming',
                            'jaccard', 'kulsinski', 'mahalanobis',
                            'minkowski', 'rogerstanimoto',
                            'russellrao', 'seuclidean',
                            'sokalmichener', 'sokalsneath',
                            'sqeuclidean', 'yule']
        scipy_dict = dict(inspect.getmembers(scipy.spatial.distance))
        yield pd.Series(data=itemgetter(*scipy_func_names)(scipy_dict),
                        index=scipy_func_names, name='scipy_funcs')


def GetPairwiseMetrics():
    import inspect
    import scipy
    import sklearn
    from operator import itemgetter

    sklearn_funcs = next(GetSklearnPairwiseMetrics())
    scipy_funcs = next(GetScipyPairwiseMetrics())
    
    yield pd.concat([sklearn_funcs, scipy_funcs])


def ComputeWholePCA(pca: PCA,
                    X,
                    y = None,
                    n_features: Union[int, float] = None
                    ):
#     if kwargs is None:
#         kwargs = {}
#     pca=PCA(**kwargs).fit(X)
    setattr(pca, 'feature_ranks_', np.argsort(pca.explained_variance_)[:n_features])
    setattr(pca, 'sorted_feature_names_in_',
            pca.feature_names_in_[np.argsort(pca.explained_variance_)])
    setattr(pca, 'feature_names_out_', pca.feature_names_in_[pca.feature_ranks_])

    attribute_names = ['feature_ranks_',
                       'explained_variance_', 'explained_variance_ratio_',
                       'singular_values_', 'estimated_mean_',
                       'log_likelihood_']

    size = len(pca.feature_names_out_)
    sorted_components = pd.DataFrame(pca.components_,
                                     index=pca.sorted_feature_names_in_,
                                     columns=pca.sorted_feature_names_in_)

    attributes_ = pd.DataFrame((pca.feature_ranks_,
                                pca.explained_variance_[pca.feature_ranks_],
                                pca.explained_variance_ratio_[pca.feature_ranks_],
                                pca.singular_values_[pca.feature_ranks_],
                                pca.mean_[pca.feature_ranks_],
                                pca.score_samples(sorted_components)[pca.feature_ranks_]),
                               index=attribute_names,
                               columns=pca.feature_names_out_)
    
    setattr(pca, 'attributes_', attributes_)
    components_table_ = pd.DataFrame(pca.components_).iloc[pca.feature_ranks_,
                                                          pca.feature_ranks_]
    components_table_.set_axis(pca.feature_names_out_, axis=0, inplace=True)
    components_table_.set_axis(pca.feature_names_out_, axis=1, inplace=True)
    setattr(pca, 'components_table_', components_table_)

    covariance_table_ = pd.DataFrame(pca.get_covariance()).iloc[pca.feature_ranks_,
                                                                pca.feature_ranks_]
    covariance_table_.set_axis(pca.feature_names_out_, axis=0, inplace=True)
    covariance_table_.set_axis(pca.feature_names_out_, axis=1, inplace=True)
    setattr(pca, 'covariance_table_', covariance_table_)

    precision_table_ = pd.DataFrame(pca.get_precision()).iloc[pca.feature_ranks_,
                                                              pca.feature_ranks_]
    precision_table_.set_axis(pca.feature_names_out_, axis=0, inplace=True)
    precision_table_.set_axis(pca.feature_names_out_, axis=1, inplace=True)
    setattr(pca, 'precision_table_', precision_table_)

    return pca
#                                      index=pca.feature_names_out_,
#                                   name='covariance_')
#     return pd.concat([attributes, covariance_table_])


def SortFeaturesByConditionPCA(X: Iterable, y: Iterable,
                               method: str = 'pearson',
                               n_features: Union[int, float] = 50
                               ) -> Bunch:

    pairwise_metrics = next(GetPairwiseMetrics())
        
    if not isinstance(y, pd.Series):
        y = pd.Series(y)
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    if isinstance(n_features, float):
        n_features = int(round(X.shape[1] * n_features, 0))

    X = X.set_axis(y,axis=0)
    feature_names_in_ = X.columns
    classes_, estimators_ = y.unique(), []

    class_vectors = [X.loc[cond] for cond in classes_]
    if method in pairwise_metrics:
        method = pairwise_metrics[method]
        class_vectors = [pd.DataFrame(method(vec.T),
                                      index=feature_names_in_,
                                      columns=feature_names_in_)
                         for vec in class_vectors]
        
    else:
        class_vectors = [vec.corr(method) for vec in class_vectors]
    
    [estimators_.append(PCA().fit(vec))
     for vec in class_vectors]
    
    return [ComputeWholePCA(estimators_[vec[0]], X=vec[1],
                            n_features=n_features)
            for vec in enumerate(class_vectors)]


def AgglomerateMulticollinearCorrelated(X, y,
                                        n_clusters: int = 2,
                                        get_connectivity: bool = True,
                                        compute_distances=True,
                                        kind: str = 'correlation',
                                        agglo_kws: Union[dict, Bunch] = None
                                        ):

    from nilearn.connectome import ConnectivityMeasure as CM
    from sklearn.cluster import FeatureAgglomeration
    from sklearn.covariance import LedoitWolf

    
    agglo_defs = dict(affinity='euclidean',
                      compute_full_tree='auto',
                      linkage='ward',
                      pooling_func=np.mean,
                      distance_threshold=None,
                      compute_distances=compute_distances)
    
    if get_connectivity is True:
        connect_mat = CM(LedoitWolf(),
                         kind='correlation').fit_transform([X.values])[0]
    else:
        connect_mat = None

    if agglo_kws is None:
        agglo_kws = {}
    agglo_defs.update(agglo_kws)

    agglo = FeatureAgglomeration(n_clusters=n_clusters,
                                 connectivity=connect_mat,
                                 **agglo_defs)
    if not isinstance(y, pd.Series):
        y = pd.Series(y)
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    agglo.fit(X, y)

    rfe = RFE(estimator=PCA(), step=1,
              n_features_to_select=1,
              importance_getter='explained_variance_')


    rfe.fit(X00[agglo00['Cs'].cluster_names_[164]], y=session00.tasks[2])
    rfe.get_feature_names_out()
    
    setattr(agglo, 'classes_', y.unique())
    setattr(agglo, 'feature_names_out_', agglo.feature_names_in_[agglo.labels_])
    setattr(agglo, 'cluster_indexes_', pd.DataFrame(zip(agglo.labels_, agglo.feature_names_out_),
                                   columns=['cluster_id',
                                            'feature_name']).groupby('cluster_id').groups)
    setattr(agglo, 'cluster_names_', dict(tuple((itm[0], agglo.feature_names_in_[itm[1]])
                                                for itm in tuple(agglo.cluster_indexes_.items()))))

    skb = SelectKBest(k=1, score_func='mutual_info_classif')
    factor_leaders = [skb.fit(X[cluster_names], y).get_feature_names_out()
                      for cluster_names in tuple(agglo.cluster_names_.values())]
    
    
    setattr(agglo, 'new_features_', pd.concat([pd.Series(data=X.iloc[:, itm[1]].T.mean().values,
                                        name=','.join(X.iloc[:, itm[1]].columns),
                                        index=X.index)
                              if len(itm[1]) > 1 else X.iloc[:, itm[1]]
                              for itm in  tuple(agglo.cluster_indexes_.items())],
                             axis=1))
    
    return agglo


def RecurviseKBestLinearSVC(X: Iterable,
                            y: Iterable,
                            score_func: str = 'f_classif',
                            step: Union[int, float] = 1
                            ) -> pd.DataFrame:

    from cimaq_decoding_utils import factorGenerator
    from sklearn.feature_selection import SelectKBest
    from sklearn.feature_selection import chi2
    from sklearn.feature_selection import f_classif, f_regression
    from sklearn.feature_selection import mutual_info_classif
    from sklearn.feature_selection import mutual_info_regression
    from sklearn.feature_selection import GenericUnivariateSelect

    summary = []

    score_funcs = pd.Series([chi2,
                             f_classif, f_regression,
                             mutual_info_classif,
                             mutual_info_regression,
                             GenericUnivariateSelect],
                            index=['chi2',
                                   'f_classif', 'f_regression',
                                   'mutual_info_classif',
                                   'mutual_info_regression',
                                   'GenericUnivariateSelect'])
    
    if isinstance(step, float):
        step = int(round(divmod(X.shape[1], 2)[0] * step, 0))
    X = X.set_axis(y, axis=0)
    for n in tqdm_(range(step, divmod(X.shape[1], 2)[0], step)):
        k = divmod(X.shape[1], 2)[0]-n
        skb = SelectKBest(score_func=score_funcs.loc[score_func],
                          k=k)
        estimator_ = RFE(estimator=skb, step=step,
                         n_features_to_select=k,
                         importance_getter='scores_')

        best_features = estimator_.fit(X=X, y=y).get_feature_names_out()

        performance = round(validate_model(next(LinearSVCGen(
                          **{"class_weight": "balanced"})),
                                     X=X[best_features], y=y,
                                     test_size=0.8).accuracy.mean(),
                            2)

        summary.append(({'n_features': X[best_features].shape[1],
                         'selected': X[best_features],
                         'performance': performance}))
    results = pd.DataFrame(summary)
    interval = (results.performance.max()*0.97,
                results.performance.max()*1.03)
    results = results.where(results.performance.between(*interval))
    return results.dropna().reset_index(drop=True).iloc[-1:, :]


def histogram_intersection(a, b):

    v = np.minimum(a, b).sum().round(decimals=1)

    return v

def LinearSVCGen(**kwargs):
    defs_kws = dict(max_iter=100000,
                    class_weight='balanced')
    if kwargs is None:
        kwargs = {}
    defs_kws.update(kwargs)
    yield LinearSVC(**defs_kws)

    
def get_corr_sign(positive: bool = True):
    yield tuple(filter(lambda x: x[0],
                           ((positive, pd.DataFrame.gt),
                            (not positive, pd.DataFrame.lt))))[0][1]


def trimmed_corr_mat(X, method: str = 'spearman',
                     thresh: float = 0.9,
                     positive: bool = True,
                     ) -> pd.DataFrame:

    corr_mat = X.corr(method)
    corr_mat.where(corr_mat.values != np.triu(corr_mat.values),
                   inplace=True)
    eq_sign = next(get_corr_sign(thresh))
  
    corr_mat = corr_mat.dropna(how='all',
                               axis=0).dropna(how='all', axis=1)
    if thresh is not None:
        corr_mat = corr_mat[eq_sign(corr_mat, thresh)]
        corr_mat = corr_mat.dropna(how='all',
                                   axis=0).dropna(how='all', axis=1)
    return corr_mat


def pairwise_correlates(X, method: str = 'spearman',
                        thresh: float = 0.9,
                        positive: bool = True
                        ) -> pd.DataFrame:

    while True:
        try:
            new_names = list(trimmed_corr_mat(X, method=method,
                                              thresh=thresh,
                                              positive=positive
                                              ).stack().index.tolist()[0])
            new_feature = pd.Series(data=X[new_names].T.mean(),
                                    name=','.join(new_names),
                                    index=X.index)
            X = pd.concat([new_feature, X.drop(new_names, axis=1)], axis=1)
        except IndexError:
            break
    return X



def untangle(X: Iterable,
             y: Iterable,
             n_clusters: int = None,
             get_connectivity: bool = True,
             compute_distances: bool = True,
             kind: str = 'correlation',
             agglo_kws: Union[dict, Bunch] = None
             ) -> FeatureAgglomeration:

    from nilearn.connectome import ConnectivityMeasure as CM
    from sklearn.cluster import FeatureAgglomeration
    from sklearn.covariance import LedoitWolf
    from sklearn.feature_selection import SelectKBest
    from sklearn.feature_selection import mutual_info_classif

    agglo_defs = dict(affinity='euclidean',
                      compute_full_tree='auto',
                      linkage='ward',
                      pooling_func=np.mean,
                      distance_threshold=None,
                      compute_distances=compute_distances)
    
    if get_connectivity is True:
        connect_mat = CM(LedoitWolf(),
                         kind=kind
                         ).fit_transform([X.values])[0]
    else:
        connect_mat = None

    if n_clusters is None:
        n_clusters = divmod(X.shape[1], 2)[0] - 1
        if n_clusters == 0:
            n_clusters = 1

    if agglo_kws is None:
        agglo_kws = {}
    agglo_defs.update(agglo_kws)

    agglo = FeatureAgglomeration(n_clusters=n_clusters,
                                 connectivity=connect_mat,
                                 **agglo_defs)
    if not isinstance(y, pd.Series):
        y = pd.Series(y)
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    agglo.fit(X, y)

    setattr(agglo, 'cluster_indexes_',
            pd.DataFrame(zip(agglo.labels_,
                             agglo.feature_names_in_),
                         columns=['cluster', 'feature']
                         ).groupby('cluster').feature)

    skb = SelectKBest(k=1, score_func=mutual_info_classif)
    factor_leaders_ = [skb.fit(X[itm[1]],
                               y).get_feature_names_out()[0]
                       for itm in tuple(agglo.cluster_indexes_)]
    setattr(agglo, 'factor_leaders_', factor_leaders_)
    return agglo


from builtins import UserWarning

def recursive_untangle(X: Iterable,
                       y: Iterable,
                       n_clusters: int = None,
                       get_connectivity: bool = True,
                       compute_distances: bool = True,
                       kind: str = 'correlation',
                       agglo_kws: Union[dict, Bunch] = None
                       ) -> FeatureAgglomeration:
#     while UserWarning:
      while X.shape[1] > 8:
#     half_size = divmod(X.shape[1], 2)[0]
#     while X.shape[1] > half_size -1:
        for n in range(X.shape[1] - 1):
            try:
                agg = (X[untangle(X, y,
                                  n_clusters=n_clusters,
                                  get_connectivity=get_connectivity,
                                  compute_distances=compute_distances,
                                  kind=kind,
                                  agglo_kws=agglo_kws
                                   ).factor_leaders_]
                       for n in range(X.shape[1]))
                next(agg)
                X = X[agg.send(X).columns]
            except (ValueError, UserWarning):
                break
        return X

# def RecurviseKBestLinearSVC(X: Iterable,
#                             y: Iterable,
#                             score_func: str = 'f_classif',
#                             step: Union[int, float] = 1
#                             ) -> pd.DataFrame:

#     from cimaq_decoding_utils import factorGenerator
#     from sklearn.feature_selection import SelectKBest
#     from sklearn.feature_selection import chi2
#     from sklearn.feature_selection import f_classif, f_regression
#     from sklearn.feature_selection import mutual_info_classif
#     from sklearn.feature_selection import mutual_info_regression
#     from sklearn.feature_selection import GenericUnivariateSelect
#     from nilearn.connectome import ConnectivityMeasure as CM
#     from sklearn.cluster import FeatureAgglomeration
#     from sklearn.covariance import LedoitWolf
#     from cimaq_decoding_utils import factorGenerator

#     summary = []

#     score_funcs = pd.Series([chi2,
#                              f_classif, f_regression,
#                              mutual_info_classif,
#                              mutual_info_regression,
#                              GenericUnivariateSelect],
#                             index=['chi2',
#                                    'f_classif', 'f_regression',
#                                    'mutual_info_classif',
#                                    'mutual_info_regression',
#                                    'GenericUnivariateSelect'])
    
#     if isinstance(step, float):
#         step = int(round(divmod(X.shape[1], 2)[0] * step, 0))
#     X = X.set_axis(y, axis=0)
#     for n in tqdm_(range(step, divmod(X.shape[1], 2)[0], step)):
#         k = divmod(X.shape[1], 2)[0]-n
#         skb = SelectKBest(score_func=score_funcs.loc[score_func],
#                           k=k)
#         estimator_ = RFECV(estimator=PCA(), step=1,
#                          min_features_to_select=1,
#                          importance_getter='explained_variance_ratio_')

#         best_features = estimator_.fit(X=X, y=y).get_feature_names_out()

#         performance = round(validate_model(next(LinearSVCGen(
#                           **{"class_weight": "balanced"})),
#                                      X=X[best_features], y=y,
#                                      test_size=0.8).accuracy.mean(),
#                             2)

#         summary.append(({'n_features': X[best_features].shape[1],
#                          'selected': X[best_features],
#                          'performance': performance}))
#     results = pd.DataFrame(summary)
#     interval = (results.performance.max()*0.97,
#                 results.performance.max()*1.03)
#     results = results.where(results.performance.between(*interval))
#     return results.dropna().reset_index(drop=True).iloc[-1:, :]

# ComputeWholePCA(X00.corr('spearman'), session00.tasks[2], n_features=20).__dict__


# method = 'cityblock'
# pwm = next(GetPairwiseMetrics())
# print(method in pwm)
# method = pwm['cityblock']
# method(X00).shape
# dict(inspect.getmembers(sklearn.metrics.pairwise.distance_metrics))
# callables_
# testmean0=X00.iloc[:, np.array([0,44, 5])].T.mean()
# testmean1=np.mean(X00.iloc[:, np.array([0,44, 5])].T.values)
# display(testmean0, testmean1,
#         np.mean(np.mean(X00.iloc[:, np.array([0,44, 5])].values, axis=1).T,
#                 ),
#         X00.iloc[:, 99].mean())


import sklearn



        
        
# FMRIPrepPathMatcher(sess5[0].fmri_path, **dict(events_dir=events_dir,
#                                                behav_dir=events_dir)).__dir__()

In [3]:
atlases_dir = '/data/simexp/fnadeau/nilearn_atlases/difumo_atlases/'
fmriprep_dir = '/data/simexp/cimaq_preproc/fmriprep/'
events_dir = '/data/simexp/fnadeau/CIMAQ_AS_BIDS_4/'
masker_dir = '/data/simexp/fnadeau/cimaq_maps_maskers/'
os.chdir('/data/simexp/fnadeau/')


In [None]:
# from cimaq_decoding_utils import preprocess_events
from cimaq_decoding_pipeline import get_fmri_sessions
from cimaq_decoding_utils import get_sub_ses_key
from cimaq_decoding_pipeline import fetch_fmriprep_session

v03, v10 = [get_fmri_sessions(topdir=fmriprep_dir,
                              events_dir=events_dir,
#                               masker_dir=masker_dir,
                              ses_id=ses)
            for ses in ["V03", "V10"]]

sess = v03+v10
dst = '/data/simexp/fnadeau/cimaq_computed_data/'

sessions = [ses for ses in sess
            if bool(get_sub_ses_key(ses.fmri_path) not in
                    list(map(get_sub_ses_key, os.listdir(dst))))]

selected = sample(sessions, 5)

sess5 = [fetch_fmriprep_session(session=ses)
         for ses in tqdm_(selected)]

In [288]:
from multiprocessing import Pool, Process


# print(len(cortex_names))

from cimaq_decoding_utils import save_masker
from builtins import FutureWarning
import warnings
labels_path = '/data/simexp/fnadeau/cortex-difumo-693-labels.txt'
cortex_labels = Path(labels_path).read_text().splitlines()
warnings.filterwarnings(action='ignore', category=FutureWarning)
# maps_masker = NiftiMapsMasker(maps_img=cortex_atlas,
#                               mask_img=session.mask_img,
#                               t_r=get_t_r(session.fmri_img),
#                               resampling_target='mask',
#                               **session.masker_defs).fit()
# save_masker(dst=masker_dir, masker=maps_masker, session=session)



trial_type_cols=['trial_type', 'recognition_performance', 'ctl_miss_ws_cs']

[session.update(dict(computed_ = get_all_contrasts(fmri_img=nimage.smooth_img(session.fmri_img, 8),
                               events=session.events,
                               masker=session.masker,
                               output_type='effect_size',
                               trial_type_cols=trial_type_cols,
                               standardize=True,
                               scale=False,
                               maximize=False,
                               glm_kws=session.glm_defs,
                               design_kws=session.design_defs,
                               feature_labels=session.feature_labels)))
 for session in tqdm_(sess5)]

  warn('Matrix is singular at working precision, regularizing...')

Computing Contrasts:   0%|                                                  | 0/117 [00:00<?, ?it/s][A
Computing Contrasts:   1%|▎                                         | 1/117 [00:00<00:53,  2.18it/s][A
Computing Contrasts:   2%|▋                                         | 2/117 [00:00<00:52,  2.19it/s][A
Computing Contrasts:   3%|█                                         | 3/117 [00:01<00:51,  2.23it/s][A
Computing Contrasts:   3%|█▍                                        | 4/117 [00:01<00:51,  2.19it/s][A
Computing Contrasts:   4%|█▊                                        | 5/117 [00:02<00:49,  2.24it/s][A
Computing Contrasts:   5%|██▏                                       | 6/117 [00:02<00:50,  2.21it/s][A
Computing Contrasts:   6%|██▌                                       | 7/117 [00:03<00:48,  2.27it/s][A
Computing Contrasts:   7%|██▊                                       | 8/117 [00:03<00:48,  2.24it/s]

Computing Contrasts:  17%|███████                                  | 20/117 [00:07<00:33,  2.91it/s][A
Computing Contrasts:  18%|███████▎                                 | 21/117 [00:07<00:32,  2.93it/s][A
Computing Contrasts:  19%|███████▋                                 | 22/117 [00:08<00:32,  2.97it/s][A
Computing Contrasts:  20%|████████                                 | 23/117 [00:08<00:31,  2.99it/s][A
Computing Contrasts:  21%|████████▍                                | 24/117 [00:08<00:31,  3.00it/s][A
Computing Contrasts:  21%|████████▊                                | 25/117 [00:09<00:30,  3.01it/s][A
Computing Contrasts:  22%|█████████                                | 26/117 [00:09<00:30,  3.03it/s][A
Computing Contrasts:  23%|█████████▍                               | 27/117 [00:09<00:29,  3.04it/s][A
Computing Contrasts:  24%|█████████▊                               | 28/117 [00:10<00:29,  3.02it/s][A
Computing Contrasts:  25%|██████████▏                           

Computing Contrasts:  37%|███████████████                          | 43/117 [00:14<00:24,  2.97it/s][A
Computing Contrasts:  38%|███████████████▍                         | 44/117 [00:15<00:24,  3.00it/s][A
Computing Contrasts:  38%|███████████████▊                         | 45/117 [00:15<00:28,  2.54it/s][A
Computing Contrasts:  39%|████████████████                         | 46/117 [00:16<00:27,  2.59it/s][A
Computing Contrasts:  40%|████████████████▍                        | 47/117 [00:16<00:26,  2.69it/s][A
Computing Contrasts:  41%|████████████████▊                        | 48/117 [00:16<00:26,  2.58it/s][A
Computing Contrasts:  42%|█████████████████▏                       | 49/117 [00:17<00:26,  2.55it/s][A
Computing Contrasts:  43%|█████████████████▌                       | 50/117 [00:17<00:25,  2.59it/s][A
Computing Contrasts:  44%|█████████████████▊                       | 51/117 [00:18<00:25,  2.61it/s][A
Computing Contrasts:  44%|██████████████████▏                   

Computing Contrasts:  56%|███████████████████████▏                 | 66/117 [00:24<00:18,  2.69it/s][A
Computing Contrasts:  57%|███████████████████████▍                 | 67/117 [00:24<00:18,  2.73it/s][A
Computing Contrasts:  58%|███████████████████████▊                 | 68/117 [00:25<00:17,  2.76it/s][A
Computing Contrasts:  59%|████████████████████████▏                | 69/117 [00:25<00:17,  2.79it/s][A
Computing Contrasts:  60%|████████████████████████▌                | 70/117 [00:25<00:16,  2.80it/s][A
Computing Contrasts:  61%|████████████████████████▉                | 71/117 [00:26<00:16,  2.82it/s][A
Computing Contrasts:  62%|█████████████████████████▏               | 72/117 [00:26<00:15,  2.84it/s][A
Computing Contrasts:  62%|█████████████████████████▌               | 73/117 [00:26<00:15,  2.86it/s][A
Computing Contrasts:  63%|█████████████████████████▉               | 74/117 [00:27<00:15,  2.86it/s][A
Computing Contrasts:  64%|██████████████████████████▎           

Computing Contrasts:  76%|███████████████████████████████▏         | 89/117 [00:34<00:10,  2.68it/s][A
Computing Contrasts:  77%|███████████████████████████████▌         | 90/117 [00:34<00:10,  2.61it/s][A
Computing Contrasts:  78%|███████████████████████████████▉         | 91/117 [00:35<00:09,  2.63it/s][A
Computing Contrasts:  79%|████████████████████████████████▏        | 92/117 [00:35<00:09,  2.60it/s][A
Computing Contrasts:  79%|████████████████████████████████▌        | 93/117 [00:36<00:09,  2.59it/s][A
Computing Contrasts:  80%|████████████████████████████████▉        | 94/117 [00:36<00:08,  2.61it/s][A
Computing Contrasts:  81%|█████████████████████████████████▎       | 95/117 [00:36<00:08,  2.63it/s][A
Computing Contrasts:  82%|█████████████████████████████████▋       | 96/117 [00:37<00:08,  2.58it/s][A
Computing Contrasts:  83%|█████████████████████████████████▉       | 97/117 [00:37<00:07,  2.61it/s][A
Computing Contrasts:  84%|██████████████████████████████████▎   

[None, None, None, None, None]

In [None]:
import pickle
from io import BytesIO

dst = '/data/simexp/fnadeau/cimaq_computed_data/'
os.makedirs(dst, exist_ok=True)

for ses in tqdm_(sess5):
    name = '_'.join([ses.sub_id, ses.ses_id, ses.task,
                     'computed-data.pickle'])
    with open(os.path.join(dst, name), mode='wb') as cornichon:
        pickle.dump(ses, cornichon, protocol=5)
    cornichon.close()
    
    
    

 40%|█████████████████▌                          | 2/5 [04:58<07:36, 152.12s/it]

In [68]:
score_cols = []
for ses in sess5:
    pca = PCA()
    pca.fit(ses.computed_.signal_matrix.corr())
    cols = pca.feature_names_in_[np.argsort(pca.explained_variance_)].tolist()
    score_cols.append(cols)
    


In [70]:
[sc[:15] for sc in score_cols]

[['corpus callosum genu rh',
  'caudate superior anterior',
  'cuneus postero-inferior lh',
  'pars opercularis lateral lh',
  'dorsolateral prefrontal cortex lh',
  'precuneus posterior',
  'lingual gyrus middle rh',
  'amygdala posterior',
  'insula superior lh',
  'parietal operculum medial rh',
  'precuneus posterior lh',
  'optic chiasm anterior',
  'anterior orbital gyrus rh',
  'anterior corona radiata superior lh',
  'cerebrospinal fluid (between postcentral gyrus and skull)'],
 ['corpus callosum genu rh',
  'caudate superior anterior',
  'cerebellum ix',
  'pars opercularis lateral lh',
  'dorsolateral prefrontal cortex lh',
  'precuneus posterior',
  'lingual gyrus middle rh',
  'amygdala posterior',
  'insula superior lh',
  'parietal operculum medial rh',
  'precuneus posterior lh',
  'optic chiasm anterior',
  'anterior orbital gyrus rh',
  'anterior corona radiata superior lh',
  'cerebrospinal fluid (between postcentral gyrus and skull)'],
 ['corpus callosum genu rh',
  

In [None]:
# sess5.keys()
import pickle
# help(pickle.load)
from nilearn import datasets
help(datasets.fetch_atlas_difumo)

In [30]:
from cimaq_decoding_params import _params
StandardScaler().fit_transform(sess5[1].computed_.weighted_matrices[0])

array([[ 0.40308973, -0.40333027,  0.3742795 , ...,  0.44217795,
         0.572422  ,  0.4070244 ],
       [-1.0280784 ,  0.9977893 , -0.97348243, ..., -1.0475073 ,
        -0.9361223 , -1.0114284 ],
       [ 0.4178063 , -0.40144676,  0.36165103, ...,  0.41062665,
         0.5699006 ,  0.4056827 ],
       ...,
       [ 0.40492907, -0.40445405,  0.40498817, ...,  0.40297922,
         0.57172173,  0.4046711 ],
       [ 0.40492907, -0.40445405,  0.40498817, ...,  0.40297922,
         0.57172173,  0.4046711 ],
       [ 0.40492907, -0.40445405,  0.40498817, ...,  0.40297922,
         0.57172173,  0.4046711 ]], dtype=float32)

In [34]:
sorted(sess5[0].keys())

['anat_img',
 'anat_path',
 'apply_defs',
 'behav',
 'behav_path',
 'clean_defs',
 'computed_',
 'confounds',
 'confounds_loader',
 'confounds_strategy',
 'design_defs',
 'events',
 'events_path',
 'feature_labels',
 'fmri_img',
 'fmri_path',
 'frame_times',
 'get_frame_times',
 'get_t_r',
 'glm_defs',
 'mask_img',
 'mask_path',
 'masker',
 'masker_defs',
 'masker_path',
 'ses_id',
 'smoothing_fwhm',
 'space',
 'sub_id',
 't_r',
 'task']

In [79]:
ohe = OneHotEncoder(sparse=False, categories='auto')#NoneX.index.tolist())
# help(OneHotEncoder)
ohe_encoded = ohe.fit_transform(np.array(X.index.tolist()).reshape(-1, 1))
ohe_encoded
# ohe_trans = ohe.transform(y.values.reshape(-1,1))
# ohe_trans
# help(train_test_split)

array([[1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.

In [142]:
sess5[0].events.iloc[:, -1]

0      Miss
1        Ws
2      Miss
3        Cs
4      Miss
       ... 
112      Cs
113      Cs
114    Miss
115      Cs
116      Cs
Name: ctl_miss_ws_cs, Length: 117, dtype: object

In [210]:
from sklearn.neural_network import MLPClassifier

# help(MLPClassifier)
ses1 = sample(sess5, 1)[0]
X, y = ses1.computed_.signal_matrix, ses1.events.iloc[:, -1].values

mlpc_params = dict(hidden_layer_sizes=(X.shape), activation='identity',
                   solver='lbfgs', alpha=0.0001, batch_size='auto',
                   learning_rate='constant', learning_rate_init=0.001,
                   power_t=0.5, max_iter=200, shuffle=True,
                   random_state=None, tol=0.0001, warm_start=False,
                   momentum=0.9, nesterovs_momentum=True,
                   early_stopping=False, validation_fraction=0.1,
                   beta_1=0.9, beta_2=0.999, epsilon=1e-08,
     
                   n_iter_no_change=10, max_fun=15000)





# y = OneHotEncoder(sparse=False, categories='auto').fit_transform(y_base)
# help(PLSRegression)
X_train, X_test, y_train, y_test = train_test_split(X, y.reshape(-1, 1),
                                                    stratify=y.reshape(-1, 1),
                                                    test_size=0.8,
                                                    random_state=4)


y_train.shape, y_test.shape
y_train, y_test = y_train.reshape(-1, 1), y_test.reshape(-1, 1)
y_train.shape, y_test.shape

mlpc = MLPClassifier(**mlpc_params)
mlpc.fit(X_train, y_train)

mlpc.__dict__ = Bunch(**mlpc.__dict__)


    
train_data = Bunch(**{'y_pred_train': mlpc.predict(X_train),
                      'log_proba_train': mlpc.predict_log_proba(X_train),
                      'proba_train': mlpc.predict_proba(X_train)})
setattr(mlpc, 'training', train_data)
# mlpc.__dict__.y_pred_train

setattr(mlpc, 'test_score', mlpc.score(X_test, y_test))

ses2 = sample([s for s in sess5
               if s != ses1], 1)[0]

X2, y2 = ses2.computed_.signal_matrix, ses.events.iloc[:, -1].values.reshape(-1, 1)

assert ses1 != ses2
setattr(mlpc, 'test_score', mlpc.score(X_test, y_test))
setattr(mlpc, 'unseen_test_score', mlpc.score(X2, y2))

pd.DataFrame(mlpc.coefs_)
# mlpc.__dict__.proba_train.shape, np.array(mlpc.coefs_).shape#,
# mlpc.score(mlpc.__dict__.proba_train, y_train)
#                       'train_score': mlpc.score(mlpc.predict(X_train), y_train)})
#                       })

# mlpc.__dict__.update(**{'y_pred_test': mlpc.predict(X_test),
#                       'log_proba_test': mlpc.predict_log_proba(X_test),
#                       'proba_test': mlpc.predict_proba(X_test)})



  y = column_or_1d(y, warn=True)
  values = np.array([convert(v) for v in values])


Unnamed: 0,0
0,"[[0.021768276479579792, -0.06940673021982627, ..."
1,"[[0.01755201481798901, 0.04948430157839832, -0..."
2,"[[0.02170379823905372, -0.04131972278692515, 0..."


In [286]:
estimators = [next(LinearSVCGen())]*len(sess5)

trial_type_cols=['trial_type', 'recognition_performance', 'ctl_miss_ws_cs']

condition_coef_ = [[next(LinearSVCGen()).fit(ses[1].computed_.signal_matrix,
                                             ses[1].events[col].values)
                    for col in trial_type_cols]
                   for ses in enumerate(sess5)]
allcoefs = flatten(condition_coef_)

coef_weights = [pd.DataFrame(est.coef_,
                     columns=est.feature_names_in_[np.argsort(np.abs(est.coef_)).T])
                for est in allcoefs]

# [data[1].set_axis(condition_coef_[data[0]], axis=0, inplace=True)
#  if data[1].shape[0] > 1 else data[1] for data in enumerate(coef_weights)]


pca_list = [PCA()]*len(coef_weights)
multi_pca = [pca_list[mat[0]].fit(mat[1].corr('spearman')).transform(mat[1])
             for mat in enumerate(coef_weights)]

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

In [285]:
display([pca.explained_variance_ for pca in pca_list])

[array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03, 5.37142058e-04, 9.69072651e-33]),
 array([9.18407854e-03, 3.24485943e-03

In [None]:
[np.mean([validate_model(next(LinearSVCGen()),
                         X=ses.computed_.signal_matrix,
                         y=ses.events[col],
                         stratify=ses.events[col],
                         test_size=0.8).accuracy.mean()
  for ntimes in range(20)])
 for col in ['trial_type', 
             'recognition_performance',
             'ctl_miss_ws_cs']]

In [140]:

X = sess5[1].computed_.signal_matrix
y_base = sess5[1].events.iloc[:, -1].values.reshape(-1, 1)
# y = OneHotEncoder(sparse=False, categories='auto').fit_transform(y_base)
# help(PLSRegression)
X_train, X_test, y_train, y_test = train_test_split(X, y_base,
                                                    stratify=y_base,
                                                    test_size=0.8,
                                                    random_state=4)

linsvc = next(LinearSVCGen())
pca = PCA()

linsvc.fit(X_train, y_train)
pca.fit(X_train)

# linsvc.coef_.shape, .shape
# X_train.shape, pca.explained_variance_.shape

plsr = PLSRegression(n_components=X_train.shape[0])

plsr.fit(X_train, pca.explained_variance_) #pd.DataFrame(linsvc.coef_).mean().values.reshape(-1, 1))
y_pred = plsr.predict(X_test)

plsr.score(y_pred, linsvc
# linsvc.fit(y_pred, y_train).coef_.shape
# pd.DataFrame
# wtf=pd.DataFrame(linsvc.coef_, columns=linsvc.feature_names_in_,
#                   index=linsvc.classes_)#[np.argsort(linsvc.coef_.T)])
# wtf
# np.argsort(linsvc.coef_).shape
# help(np.ravel)


  y = column_or_1d(y, warn=True)


ValueError: X has 1 features, but PLSRegression is expecting 1024 features as input.

In [139]:
# np.ravel(linsvc.coef_).shape
X_test.shape, y_test.shape

((90, 1024), (90, 1))

In [92]:
from scipy.signal import savgol_filter
 
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OneHotEncoder

X = sess5[1].computed_.signal_matrix
y_base = sess5[1].events.iloc[:, -1].values.reshape(-1, 1)
# y = OneHotEncoder(sparse=False, categories='auto').fit_transform(y_base)
# help(PLSRegression)
X_train, X_test, y_train, y_test = train_test_split(X, y_base,
                                                    stratify=y_base,
                                                    test_size=0.8,
                                                    random_state=4)



plsr = PLSRegression(n_components=X.shape[1])

plsr.fit(X_train, y_train)
y_pred = plsr.predict(X_train)
y_pred
# plsr.__dict__
# plsr.score(, X_train)
# pca = PCA()

# pca.fit(wtest[0].corr())
# 
# display(pca.feature_names_in_.shape,
#  pca.feature_names_in_[np.argsort(pca.explained_variance_ratio_)].shape)

ValueError: could not convert string to float: 'Ctl'

In [None]:
di1024 = get_difumo(data_dir=atlases_dir,
                    dimension=1024,
                    resolution_mm=3)

In [None]:
# # sorted(Path(os.getcwd()).rglob('difumo_atlases/'))
# # help(Path().rglob)
# # kwargs = dict(dimension=1024, resolution_mm=3)

# # attempt = f'{kwargs['resolution_mm']}mm/'
# sorted(sessions[0].keys())


# # sorted(Path(os.getcwd()).rglob())
help(NiftiMapsMasker)

In [None]:
# def make_maskers(sessions):
#     maskers = [NiftiMapsMasker(maps_img=di1024.maps,
#                                      mask_img=nimage.load_img(session.mask_path),
#                                      resampling_target='maps',
#                                      **{'standardize': False,
#                                         'standardize_confounds': False,
#                                         'high_variance_confounds': False,
#                                         'smoothing_fwhm': None,
#                                         'detrend': False,
#                                         'dtype': 'f',
#                                         'low_pass': None,
#                                         'high_pass': None,
#                                         'allow_overlap': True})
#                for session in sessions]
#     return tuple(map(NiftiMapsMasker.fit, maskers))

# makser_list = make_maskers(sessions)

In [None]:


# def MaskerGen(**kwargs):
#     while True:
#         yield NiftiMapsMasker(maps_img=di1024.maps,
#                               mask_img=session.mask_img,
#                               t_r=get_t_r(session.fmri_img),
#                               resampling_target='mask',
#                               **session.masker_defs).fit()

# common_maps_masker = NiftiMapsMasker(maps_img=di1024.maps,
#                                      mask_img=session.mask_img,
#                                      resampling_target='maps',
#                                      **{'standardize': False,
#                                         'standardize_confounds': False,
#                                         'high_variance_confounds': False,
#                                         'smoothing_fwhm': None,
#                                         'detrend': False,
#                                         'dtype': 'f',
#                                         'low_pass': None,
#                                         'high_pass': None,
#                                         'allow_overlap': True}).fit()


In [None]:
# session00, session01, session02 = [fetch_fmriprep_session(session=ses)
#                                    for ses in sessions[:3]
#                                    if ses.sub_id not in [s.sub_id for s in sessions2]]

In [None]:
# display(*sess5[0].computed_.weighted_matrices)
len(os.listdir(dst))

In [None]:
sorted(sess5[0].keys())

In [14]:
w_validation.values

array([[{'trial_type': 0.9906382978723406}],
       [{'trial_type': 0.9914893617021278}],
       [{'trial_type': 0.9914893617021278}],
       [{'trial_type': 1.0}],
       [{'trial_type': 0.9910638297872342}]], dtype=object)

In [9]:
w_validation = pd.DataFrame([[{ses.computed_.weighted_matrices[n].index.name:
      np.mean([validate_model(next(LinearSVCGen()),
                              X=ses.computed_.weighted_matrices[n],
                              y=ses.computed_.weighted_matrices[n].index,
                              test_size=0.8,
                              stratify=ses.computed_.weighted_matrices[n].index,
                              ).accuracy.mean()
         for x in range(25)])
 for n in range(3)}]
 for ses in tqdm_(sess5)])

100%|████████████████████████████████████████████| 5/5 [23:20<00:00, 280.08s/it]


In [None]:
np.mean([validate_model(next(LinearSVCGen()),
                      X=ses.computed_.weighted_matrices[n],
                      y=ses.computed_.weighted_matrices[n].index,
                      test_size=0.8,
                      stratify=ses.computed_.weighted_matrices[n].index,
                      ).accuracy.mean()
 for x in range(25)])

In [None]:
# str(type(com_)).rsplit('.', maxsplit=1)[1]
# dict(getmembers(com_))


html_list
# os.listdir(html_dir).__len__()
# valid_html = [apath for apath in html_list
#               if get_sub_ses_key(apath)[0] in to_validate]
[apath for apath in html_list if
 os.path.basename(apath).split('.')[0]
 in [s.sub_id for s in sessions]].__len__()
# valid_html

In [None]:
from sklearn.feature_selection import VarianceThreshold

a, b, c = sess5[0].computed_.weighted_matrices

vt = [VarianceThreshold()]*len([a, b, c])


[vt[i[0]].fit(i[1]) for i in enumerate([a, b, c])]



# display(*[pd.Series(vt[0].variances_,
#                     index=vt[0].get_feature_names_out()).sort_values()[:20],
#           pd.DataFrame(a).var().sort_values().iloc[:20]])
from collections import Counter
Counter(flatten([d.index.tolist() for d in [pd.Series(vt[0].variances_,
                    index=vt[0].get_feature_names_out()).sort_values()[:20],
          pd.DataFrame(a).var().sort_values().iloc[:20]]])).most_common()
# vt.fit_transform(a).shape

# pd.DataFrame(enumerate(tuple(zip(vt.get_feature_names_out(),))))
    
    
# vt.var = pd.Series(data=vt.variances_, index=vt.get_feature_names_out())

# vt.var
# vt.variances_table_.where(vt.variances_table_>vt.variances_table_.quantile(0.25)).dropna(axis=0, how='all')
#.sort_values('variances_')

# pd.DataFrame(describe(X000))
# sorted(describe(X000).__dir__())
# from scipy.stats.stats import DescribeResult
# help(np.argsort)
# pca0 = PCA().fit(weighted_matrices[2].corr('spearman'))
# pd.DataFrame(pca0.explained_variance_)
# display(np.argsort(pca0.explained_variance_, axis=0),
#         np.argsort(pca0.explained_variance_, axis=1))
# from collections import namedtuple
# namedtuple(describe(X000))
# describe(X000)._asdict()['minmax']

with open(os.getcwd(), mode='r') as pwd:
    

In [None]:
def describe_data(X: Iterable
                  ) -> pd.DataFrame:
    from scipy.stats import describe
    from scipy.stats.stats import DescribeResult
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    _fields = DescribeResult._fields[2:]
    min_, max_ = _desc_.minmax
    _desc_ = pd.DataFrame(([min_, max_] +
                           list(describe(X)[2:])),
                          index=(['min', 'max',] +
                                 list(_fields)))
    
    desc_ = X.describe()

    return _desc_, desc_

display(describe_data(X000))

In [None]:
# sorted(sessions3[0].computed_.keys())

# display(*sessions3[0].computed_.weighted_matrices,
#         sessions3[0].computed_.whole.signals)
from itertools import combinations
# [mat.values[0] == mat.values[1] for mat in sessions3[0].computed_.weighted_matrices]
# display(sessions3[0].computed_.weighted_matrices[].values ==
#         sessions3[0].computed_.weighted_matrices[3].values)

[itm[0].values == itm[1].values for itm in
 tuple(combinations(sessions3[0].computed_.matrices, r=2))]

In [None]:
tuple(sessions3[0].tasks)

In [None]:
from cimaq_decoding_pipeline import weightings

l0 = [sessions3[0].computed_[col].signals
          for col in trial_type_cols]
matrices = [sessions3[0].computed_.whole.signals.copy(deep=True).set_axis(sessions3[0].tasks[n])
            for n in range(len(l0))]

# display(*matrices)
weighted_matrices = [weightings(matrices[task[0]], l0[task[0]])
                     for task in enumerate(sessions3[0].tasks)]

display(*weighted_matrices)



# l1 = sessions3[0].computed_.weights
# display(*[itm[0].values==itm[1].values
#         for itm in tuple(zip(l0, l1))])

In [None]:
pca0 = PCA().fit(weighted_matrices[2].corr('spearman'))


In [None]:
weighted_matrices[2]

In [None]:
ranks_ = np.argsort(pca0.explained_variance_)
pd.DataFrame(pca0.explained_variance_,
             index=pca0.feature_names_in_[ranks_])

In [None]:
sorted(pca0.__dict__.keys())

In [None]:
from sklearn.feature_selection import VarianceThreshold
from scipy.stats import describe
from operator import attrgetter, itemgetter

X000 = sessions3[0].computed_.trial_type.signals

# display(X000.describe(), describe(X000))
# PCA().fit()

# describe(X000)._asdict()

# sorted(dir(describe(X000)))
# itemgetter(*describe(X000)._fields)
# pd.DataFrame(describe(X000)[1:])

# help(scipy.stats.stats.DescribeResult)
scipy.stats.stats.DescribeResult._fields

In [None]:
sessions3 = [session00, session01, session02]

[setattr(ses, 'tasks', [ses.events[col] for col in trial_type_cols])
 for ses in sessions3]

In [None]:
display(*[ses.computed_.signal_matrix for ses in sessions3])

In [None]:
common_maps_signals = [pd.DataFrame(next(MaskerGen(ses)).fit_transform(
                           nimage.smooth_img(ses.contr, 8)),
                                    columns=di1024.labels.difumo_names)
                       for ses in tqdm_(sessions2)]

In [None]:
display(*common_maps_signals)

In [None]:
# signal_paths = sorted(Path('/data/simexp/cimaq_extracted_signals/').iterdir())
# trial_type_cols=['trial_type', 'recognition_performance', 'ctl_miss_ws_cs']

# [ses.update({'signals_path':[apath for apath in signal_paths
#                              if get_sub_ses_key(apath) ==
#                              get_sub_ses_key(ses.fmri_path)][0]})
#  for ses in sessions]

# [ses.update({'signal_matrix': pd.read_csv(ses.signals_path, sep='\t')})
#  for ses in sessions]

# [ses.update({'events': pd.read_csv(ses.events_path, sep='\t')})
#  for ses in sessions]

# [session.update(Bunch(**dict(tasks=(session.events.trial_type,
#                                     session.events.recognition_performance.replace({'Miss':'Fail'}),
#                                     session.events.iloc[:, -1]))))
#  for session in sessions]

In [None]:
# [preprocess_events(session=ses).to_csv(ses.events_path, sep='\t',
#                                        encoding='UTF-8-SIG',
#                                        index=True)
#  for ses in tqdm_(sessions)]

In [None]:
# print(f'{session00.sub_id}_{session00.ses_id}\n',
#       f'{session01.sub_id}_{session01.ses_id}\n',
#       f'{session02.sub_id}_{session02.ses_id}')

In [None]:
foremidbrain_rois = ['thalam', 'pons', 'tegment', 'medulla', 'putamen',
                     'fornix', 'cerebell', 'pedunc', 'tect', 'diencep',
                     'striatum', 'fluid', 'colli', 'pars', 'nuc', 'reticu',
                     'caud', 'ventricle', 'midbrain', 'mamm', 'fossa',
                     'caps', 'gang', 'globus', 'claus', 'sinus']

wm_rois = ['fasciculus', 'tract' 'forceps', 'callosum',
           'cingulum', 'forceps', 'chiasm', 'corticospinal',
           'radiation', 'radiata']


no_cortex = foremidbrain_rois + wm_rois



In [None]:
di693 = get_difumo(data_dir=atlases_dir, dimension=1024, resolution_mm=3)

not_cortex = di693.labels.set_index('difumo_names').T.filter(regex='|'.join(no_cortex)).columns.tolist()

# tmp = di693.labels.set_index('difumo_names').T
cortex693_labels = di693.labels.set_index('difumo_names').T.drop(not_cortex, axis=1).T
cortex693_labels.shape

In [None]:
cortex_693_maps_indexes = di693.labels.reset_index(drop=False).set_index(
                              'difumo_names', drop=False).loc[
                                  cortex693_labels.index].component

cortex_693_maps = nimage.index_img(di693.maps, cortex_693_maps_indexes)


In [None]:
sessions[0].clean_defs

In [None]:
import load_confounds
from inspect import getmembers
from nilearn import image as nimage
from sklearn.utils import Bunch
from pathlib import Path

from cimaq_decoding_params import _params
from cimaq_decoding_utils import get_t_r, get_frame_times
from get_difumo import get_difumo

strategy='Minimal'
lc_kws={}
loader = dict(getmembers(load_confounds))[f'{strategy}']
loader
# loader = [loader(**lc_kws) if lc_kws is not None
#           else loader()][0]
# conf = pd.DataFrame(loader.load(session['fmri_path']))


In [None]:
help(next(ConfLoaderGen(**{})).load)

In [None]:
def ConfLoaderGen(strategy: str = 'Minimal',
                  **kwargs):
    import load_confounds
    from inspect import getmembers
    if kwargs is None:
        kwargs = {}
    ConfGen = (dict(getmembers(load_confounds))
               [f'{strategy}'](kwargs))
    while True:
        yield ConfGen
# sessions[0].fmri_path
next(ConfLoaderGen(**{})).load(['/data/simexp/cimaq_preproc/fmriprep/sub-3025432/ses-V03/func/sub-3025432_ses-V03_task-memory_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz'])

In [None]:
def ConfLoaderGen(strategy: str = 'Minimal',
                  **kwargs):

    import load_confounds
    from inspect import getmembers

    if kwargs is None:
        kwargs = {}
    ConfGen = (dict(getmembers(load_confounds))
               [f'{strategy}'](kwargs))
    while True:
        yield ConfGen


def CleanImgGen(img,
                strategy='Minimal',
                mask_img=None,
                confounds=None,
                lc_kws=None,
                **kwargs):


    from nilearn import image as nimage
    from sklearn.utils import Bunch
    from pathlib import Path

    from cimaq_decoding_params import _params
    from cimaq_decoding_utils import get_t_r, get_frame_times
    from get_difumo import get_difumo

    if lc_kws is None:
        lc_kws = {}
    loader = next(ConfLoaderGen(strategy=strategy,
                                **lc_kws))
    conf = pd.DataFrame(loader.load(img))
    return conf
#     from nilearn.image import clean_img, load_img

#     clean_defs = {'detrend': False,
#                   'standardize': False,
#                   'low_pass': None,
#                   'high_pass': None,
#                   'ensure_finite': True}
#     if kwargs is None:
#         kwargs = {}
#     clean_defs.update(**kwargs)
#     img=load_img(img)
#     if mask_img is not None:
#         mask_img=load_img(mask_img)

#     return clean_img(img, mask_img=mask_img,
#                      tr=get_t_r(img),
#                      confounds=conf,
#                      **clean_defs)

CleanImgGen(img=sessions[0].fmri_path,
            mask_img=sessions[0].mask_path)

In [None]:
nimage.clean_img(fmri_img, confounds=conf,
                                t_r=t_r, mask_img=mask_img,
                                **_params.clean_defs)

In [None]:
smooth_signals = [common_maps_masker.fit_transform(nimage.smooth_img(nimage.load_img(sess.fmri_path)))]

In [None]:
# dimensions = [1024, 512, 256, 128, 64]

# di1024, di512, di256, d1128, di64 = [get_difumo(dimension, 3, atlases_dir)
#                                      for dimension in tqdm_(dimensions)]

# atlases = [di1024, di512, di256, d1128, di64]


In [None]:
# X00 = session00.computed_.signal_matrix.copy(deep=True)
# X01 = session01.computed_.signal_matrix.copy(deep=True)




#             try:
                
#             except UserWarning:
#                 break

# agg00 = untangle(X00, session00.tasks[2])
# uX00 = X00[agg00.factor_leaders_]
# agg_test = untangle(uX00, session00.tasks[2])
# uX00b = X00[agg_test.factor_leaders_]
# agg00b = untangle(uX00b, session00.tasks[2])
# uX00c = X00[agg00b.factor_leaders_]
# agg00c = untangle(uX00c, session00.tasks[2])
# uX00d = X00[agg00c.factor_leaders_]
# agg00d = untangle(uX00d, session00.tasks[2])
# uX00e = X00[agg00d.factor_leaders_]
# agg00e = untangle(uX00e, session00.tasks[2])
# uX00f = X00[agg00e.factor_leaders_]
# agg00f = untangle(uX00f, session00.tasks[2])
# uX00g = X00[agg00f.factor_leaders_]
# agg00g = untangle(uX00g, session00.tasks[2])
## Proof of concept

# shorter = pairwise_correlates(X)
# newcols = [c for c in shorter.columns.tolist()
#            if ',' in c]
# len(','.join(newcols).split(',')) == len(set(','.join(newcols).split(',')))

In [None]:
session00.signal_matrix

In [None]:
X_big = pd.concat([ses.computed_.signal_matrix
                   for ses in sessions3])
[setattr(ses, 'tasks', [ses.events[col] for col in trial_type_cols])
 for ses in sessions3]
tasks_big = [pd.concat([ses.tasks[task[0]] for ses in sessions3])
             for task in enumerate(sessions3[0].tasks)]
display(X_big.iloc[:divmod(X_big.shape[0], 8)[0],:].shape,
        tasks_big[2].iloc[:divmod(X_big.shape[0], 8)[0]].shape)

In [None]:
warnings.filterwarnings(action='ignore', category=UserWarning)
[np.mean([validate_model(next(LinearSVCGen(**dict(max_iter=100000))),
                sess.computed_.signal_matrix,
                sess.tasks[2],
                test_size=0.8,
                stratify=sess.tasks[2]).accuracy.mean()
          for n in tqdm_(range(20))])
 for sess in [session00, session01, session02]]

In [None]:
from cimaq_decoding_utils import factorGenerator
list(factorGenerator(sessions[0].computed_.signal_matrix.shape[1]))

In [None]:
shorts = [recursive_untangle(sess.signal_matrix,
                             sess.tasks[2],
                             n_clusters=None)#divmod(sess.computed_.signal_matrix.shape[1], 2)[0] - 1)
          for sess in tqdm_(sessions)]

In [None]:
shorts[0].shape

In [None]:
warnings.filterwarnings(action='ignore', category=UserWarning)
[np.mean([validate_model(next(LinearSVCGen(**dict(max_iter=100000))),
                shorts[sess[0]],
                sess[1].tasks[2],
                test_size=0.8,
                stratify=sess[1].tasks[2]).accuracy.mean()
          for n in tqdm_(range(20))])
 for sess in enumerate([session00, session01, session02])]

In [None]:
group_short = recursive_untangle(X_big, tasks_big[2], n_clusters=None)

In [None]:
display(group_short)

In [None]:
[setattr(ses, 'masker', next(MaskerGen(ses))) for ses in tqdm_(sessions)]

In [None]:
[sess.update({'smooth_fmri': nimage.smooth_img(
    sess.masker.inverse_transform(
        sess.signal_matrix.set_index('trial_type')), 8)})
 for sess in tqdm_(sessions)
 if 'V03' in sess.fmri_path]

In [None]:
from collections import Counter

Counter(flatten(short.columns.tolist()
                   for short in shorts)).most_common()

In [None]:
len(sessions)

In [None]:
validate_model(next(LinearSVCGen(max_iter=100000)),
                            X_big.set_axis(tasks_big[2],
                                           axis=0).iloc[:divmod(X_big.shape[0], 8)[0],:],
                            tasks_big[2].iloc[:divmod(X_big.shape[0], 8)[0]],
                            test_size=0.8,
                            stratify=tasks_big[2].iloc[:divmod(X_big.shape[0], 8)[0]])

In [None]:
score_big = \
    np.mean([validate_model(next(LinearSVCGen(max_iter=1000000)),
                            X_big.set_axis(tasks_big[2], axis=0),
                            tasks_big[2],
                            test_size=0.8,
                            stratify=tasks_big[2]).accuracy.mean()
                       for n in tqdm_(range(20))])


In [None]:


[ses.update({'reduced_signals':
             recursive_untangle(ses.signal_matrix,
                                ses.tasks[2]).set_index('trial_type')})
 for ses in tqdm_(sessions)]

In [None]:
[ses.reduced_signals.set_index('trial_type', inplace=True)
 for ses in sessions]

In [None]:
from collections import Counter
Counter(flatten((ses.reduced_signals.set_index('trial_type').columns.tolist()
                 for ses in sessions))).most_common()

In [None]:
from builtins import UserWarning
warnings.simplefilter('ignore', UserWarning)
scores = [np.mean([validate_model(next(LinearSVCGen()),
                                  ses.reduced_signals,
                                  ses.tasks[2], test_size=0.8,
                                  stratify=ses.tasks[2]).accuracy.mean()
                   for n in range(20)])
          for ses in tqdm_(sessions)]

In [None]:
scores

In [None]:
pd.DataFrame()

In [None]:
set(flatten(tst.columns.tolist() for tst in test)).__len__()

In [None]:
from cimaq_decoding_pipeline import weightings

X00cp = X00.copy(deep=True)
X00cp.set_axis(session00.tasks[2]).loc['Ctl'] = X00cp.set_axis(session00.tasks[2]).loc['Ctl']/3

In [None]:
# linsvc00.decision_table_
support_vector_indices = list(set(list(np.where(np.abs(linsvc00.decision_table_) <= 1 + 1e-15)[0])))
support_vectors = X00.iloc[[ind for ind in range(X00.shape[0])
                            if ind not in np.array(support_vector_indices)]]
tst_vecs = support_vectors.abs().mean().sort_values(ascending=False).head(40).index
sel_ = results01[-2]['selected'].columns

In [None]:
validate_model(next(LinearSVCGen()), X01, session01.tasks[2],
               test_size=0.8)

In [None]:
display(next(LinearSVCGen()).fit(X00, session00.tasks[2]).densify().coef_.round(5)[0] ==
 next(LinearSVCGen()).fit(X00, session00.tasks[2]).coef_.round(5)[0])

In [None]:
linsvc00 = next(LinearSVCGen()).fit(X00, session00.tasks[2])

linsvc00.coef_table_ = pd.DataFrame(linsvc00.coef_,
                                    index=linsvc00.classes_,
                                    columns=linsvc00.feature_names_in_)

linsvc00.decision_table_ = pd.DataFrame(linsvc00.decision_function(X00),
                                        columns=linsvc00.classes_,
                                        index=linsvc00.predict(X00))


display(linsvc00.coef_table_, linsvc00.coef_)

# linsvc01 = next(LinearSVCGen())

# linsvc01.fit(linsvc00.coef_table_,
#              linsvc00.coef_table_.index.values)

# linsvc01.score(X00, session00.tasks[2])
# linsvc01.coef_table_ = pd.DataFrame(linsvc01.coef_, index=linsvc01.classes_,
#                                    columns=linsvc01.feature_names_in_)

# linsvc01.decision_table_ = pd.DataFrame(linsvc01.decision_function(X),
#                                        columns=linsvc01.classes_,
#                                        index=linsvc01.predict(X))

# display(linsvc01.score(X, tasks[1]),
#         [row[1][row[1]==row[1].max()]
#          for row in linsvc01.coef_table_.iterrows()],
#         linsvc01.decision_table_)


# len(list(filter(None, linsvc01.predict(X)==tasks[1].values)))/len(tasks[1].values)
# linsvc00.coef_table_
# display(linsvc00.score(X,  linsvc00.predict(X)),
#         linsvc00.decision_table_, linsvc00.coef_table_)

In [None]:
linsvc00.coef_.argsort()

In [None]:
X00.__dict__

In [None]:
from matplotlib import pyplot as plt
from sklearn import svm

def f_importances(estimator,
                  X: Iterable,
                  y: Iterable,
                  importance_getter: Union[str, callable] ='coef_',
                  names=None):

    import numpy as np
    import pandas as pd
    from operator import attrgetter
    
    estimator.fit(X, y)
    if hasattr(estimator, 'feature_names_in_'):
        feature_names_in_
        if names is None:
        names = coef.columns
#     coef, names = zip(*list(zip(abs(coef),
#                                 names[np.argsort(coef)])))
    return names[np.argsort(coef.abs())]
#     coef, names = coef[:25], names[:25]
#     plt.barh(range(len(names)), coef,
#              align='center')
#     plt.yticks(range(len(names)), names)
#     plt.show()
#     return pd.DataFrame(zip(coef, names))
    
test_coef = f_importances(next(LinearSVCGen()),
                          X00,
                          session00.tasks[2])
test_coef[0]
#               names=linsvc00.coef_table_.columns.tolist())

In [None]:
# (linsvc00.coef_table_.loc['Cs'].abs().mul(100).round(4)).sort_values()# - 
#  linsvc00.coef_table_.loc['Ctl'].abs().mul(100).round(4)).sort_values(ascending=False)
subset_ = linsvc00.coef_table_.mean().abs().sort_values(ascending=False).head(30).index
 

In [None]:
best10 = [linsvc00.coef_table_.pow(2).T.sort_values(col, ascending=False).index.tolist()[
    :divmod(linsvc00.coef_table_.shape[1], 8)[0]
]
 for col in linsvc00.coef_table_.pow(2).T.columns]

In [None]:
len(flatten(best10))
col_set = list(set(flatten(best10)))

In [None]:
pd.DataFrame([[(row[1].index==cond) for row in results00['selected'].iloc[-1].T.iterrows()]
          for cond in results00['selected'].iloc[-1].index.unique()],
         dtype=int)

In [None]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.inspection import permutation_importance

ohe_ = pd.DataFrame(OneHotEncoder(sparse=False).fit_transform(session00.tasks[2].values.reshape(-1, 1)))

# linsvc00b = LinearSVC(max_iter=10000, class_weight='balanced')
# linsvc00b.fit(results00['selected'].iloc[-1],
#               session00.tasks[2].values)

logreg00 = LogisticRegression(max_iter=10000, multi_class='ovr', n_jobs=11)
logreg00.fit(results00['selected'].iloc[-1],
             session00.tasks[2])
logreg00.coef_table_ = pd.DataFrame(logreg00.coef_,
                                    index=logreg00.classes_,
                                    columns=logreg00.feature_names_in_)
logreg00.predict(results00['selected'].iloc[-1])
logreg00.score(results00['selected'].iloc[-1],
               session00.tasks[2])


importances = pd.DataFrame(data=permutation_importance(logreg00,
                                                       results00['selected'].iloc[-1],
                                                       session00.tasks[2],
                       scoring='accuracy', n_jobs=11).importances,
                        index=logreg00.feature_names_in_)
importances
# logreg00.predict_proba(results00['selected'].iloc[-1]).shape

In [None]:
X00.set_axis(session00.tasks[2], axis=0).drop('Ctl', axis=0)

In [None]:
session00.tasks[2]

In [None]:
from sklearn.ensemble import RandomForestClassifier

X = X00.set_axis(session00.tasks[2], axis=0).drop('Ctl', axis=0)
y = X00.set_axis(session00.tasks[2], axis=0).drop('Ctl', axis=0).index
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.8,
                                                    random_state=42,
                                                    stratify=y)

clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
print("Accuracy on test data: {:.2f}".format(clf.score(X_test, y_test)))

In [None]:
result = permutation_importance(clf, X_train, y_train, n_repeats=10, random_state=42)
perm_sorted_idx = result.importances_mean.argsort()

tree_importance_sorted_idx = np.argsort(clf.feature_importances_)
tree_indices = np.arange(0, len(clf.feature_importances_)) + 0.5
plotted = result.importances[perm_sorted_idx].T
plotted_labels = X.columns[perm_sorted_idx]

plotted.shape

# fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8))
# ax1.barh(tree_indices, clf.feature_importances_[tree_importance_sorted_idx], height=0.7)
# ax1.set_yticks(tree_indices)
# ax1.set_yticklabels(data.feature_names[tree_importance_sorted_idx])
# ax1.set_ylim((0, len(clf.feature_importances_)))

# ax2.boxplot(plotted, vert=False, labels=plotted_labels)
# fig.tight_layout()
# plt.show()

In [None]:
ordered_cols = [[col[1] for col in enumerate(X.columns) if col[0] == idx][0]
                for idx in perm_sorted_idx]

X_permutation = X[ordered_cols].iloc[:, :30]
X_subset = X[subset_]


[col for col in X_subset.columns
 if col in X_permutation.columns
 ]

In [None]:
# fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8))
from collections import defaultdict
from scipy.stats import spearmanr
from scipy.cluster import hierarchy
from scipy.spatial.distance import squareform

corr = spearmanr(X).correlation

# Ensure the correlation matrix is symmetric
corr = (corr + corr.T) / 2
np.fill_diagonal(corr, 1)

# We convert the correlation matrix to a distance matrix before performing
# hierarchical clustering using Ward's linkage.
distance_matrix = 1 - np.abs(corr)
dist_linkage = hierarchy.ward(squareform(distance_matrix))
# dendro = hierarchy.dendrogram(dist_linkage,
#                               labels=X.columns.tolist(),
#                               ax=ax1, leaf_rotation=90)

In [None]:


# dendro_idx = np.arange(0, len(dendro["ivl"]))
cluster_ids = hierarchy.fcluster(dist_linkage, 1, criterion="distance")

cluster_id_to_feature_ids = defaultdict(list)
for idx, cluster_id in enumerate(cluster_ids):
    cluster_id_to_feature_ids[cluster_id].append(idx)
selected_features = [v[0] for v in cluster_id_to_feature_ids.values()]

X_train_sel = X_train[:, selected_features]
X_test_sel = X_test[:, selected_features]

clf_sel = RandomForestClassifier(n_estimators=100, random_state=42)
clf_sel.fit(X_train_sel, y_train)
print(
    "Accuracy on test data with features removed: {:.2f}".format(
        clf_sel.score(X_test_sel, y_test)
    )
)
# ax2.imshow(corr[dendro["leaves"], :][:, dendro["leaves"]])
# ax2.set_xticks(dendro_idx)
# ax2.set_yticks(dendro_idx)
# ax2.set_xticklabels(dendro["ivl"], rotation="vertical")
# ax2.set_yticklabels(dendro["ivl"])
# fig.tight_layout()
# plt.show()

In [None]:
importances_svc00 = pd.DataFrame(permutation_importance(linsvc00, X00,
                                                  session00.tasks[2],
                                                  scoring='accuracy',
                                                  n_jobs=11).importances,
                        index=linsvc00.feature_names_in_)

In [None]:
permutation_importance(linsvc00, X=X00,
                                                  y=session00.tasks[2],
                                                  scoring='accuracy',
                                                  n_jobs=11)

In [None]:
session00.new_roi_indexes = [[col[0] for col in enumerate(X00.columns)
                              if col[1] in newcol.split(',')]
                             for newcol in new_features.columns]
session00.new_maps_img = [nimage.mean_img(nimage.index_img(session00.masker._resampled_maps_img_, idx))
                          for idx in session00.new_roi_indexes]

In [None]:
session00.new_features = new_features

session00.new_masker = NiftiMapsMasker(maps_img=session00.new_maps_img,
                                       mask_img=session00.mask_img,
                                       t_r=get_t_r(session00.fmri_img),
                                       resampling_target='mask',
                                       **session00.masker_defs).fit()


In [None]:
session00.new_fmri_img = session00.new_masker.inverse_transform(session00.new_features)

In [None]:
new_imgs = {}
[new_imgs.update(itm) for itm in
 flatten([[{cond: session00.new_masker.inverse_transform(session00.new_features.set_axis(
               session00.tasks[task[0]], 0).loc[cond])}
                     for cond in session00.tasks[task[0]].unique()]
                    for task in enumerate(session00.tasks)])]


In [None]:
init_imgs = {}
[init_imgs.update(itm) for itm in
 flatten([[{cond: session00.masker.inverse_transform(session00.computed_.signal_matrix.set_axis(
               session00.tasks[task[0]], 0).loc[cond])}
                     for cond in session00.tasks[task[0]].unique()]
                    for task in enumerate(session00.tasks)])]
# niplot.plot_connectome(session00.new_features.corr('spearman'),
#                        niplot.find_probabilistic_atlas_cut_coords(session00.new_fmri_img))

In [None]:
# session00.computed_.signal_matrix.max().max()
sum(np.array([1,2,3,4]), np.array([1,2,3,4]))

In [None]:
[niplot.plot_glass_brain(nimage.mean_img(itm[1]),
                         title=f'Before RFE & Clustering: {itm[0]}',
                         plot_abs=False,
                         colorbar=True,
                         black_bg=True)
 for itm in tuple(init_imgs.items())]

In [None]:
[niplot.plot_glass_brain(nimage.mean_img(itm[1]),
                         title=f'After RFE & Clustering: {itm[0]}',
                         plot_abs=False,
                         colorbar=True,
                         black_bg=True)
 for itm in tuple(new_imgs.items())]

In [None]:
new_imgs

In [None]:
colset = set(flatten([sorted(row[1].sort_values(ascending=False).iloc[:3].index.tolist())
 for row in linsvc00.coef_table_.pow(2).iterrows()]))
# linsvc00.coef_table_.pow(2)
display(len(colset),
        round(np.mean([validate_model(next(LinearSVCGen()),
                                      new_features,#X_subset,#X_permutation,#results00['selected'].iloc[-1],#.filter(regex='cingulate|hippo|occip'),#X01[colset],
                                      session00.tasks[2],
                                      stratify=session00.tasks[2],
                                      test_size=0.8).accuracy.round(2).mean()
         for n in tqdm_(range(30))]), 2),
        new_features#results00['selected'].iloc[-1]#.filter(regex='cingulate|hippo|occip'),
        )

In [None]:
# {cond:linsvc2.coef_table_[linsvc2.coef_table_==linsvc2.coef_table_.loc[cond].max()].loc[cond].dropna()
#  for cond in linsvc2.coef_table_.index}
from cimaq_decoding_utils import factorGenerator
list(factorGenerator(X00.shape[1])), list(factorGenerator(X01.shape[1]))

In [None]:
from cimaq_decoding_utils import factorGenerator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import SelectFdr, SelectFpr
from sklearn.feature_selection import SelectFwe, SelectPercentile


score_funcs = pd.Series([chi2, SelectFdr, SelectFpr,
                     SelectFwe, SelectKBest,
                     f_classif, f_regression,
                     mutual_info_classif,
                     mutual_info_regression,
                     GenericUnivariateSelect],
                    index=['chi2', 'SelectFdr',
                           'SelectFpr', 'SelectFwe',
                           'SelectKBest', 'f_classif',
                           'f_regression',
                           'mutual_info_classif',
                           'mutual_info_regression',
                           'GenericUnivariateSelect'])

score_funcs.loc['GenericUnivariateSelect']

In [None]:
sorted(pca_new.feature_names_in_, key=np.argsort(pca_new.explained_variance_))
# help(sorted)

In [None]:
X00[agg00g.factor_leaders]
['supramarginal gyrus postero-superior lh',
 'subparietal sulcus anterior rh',
 'postcentral gyrus superior lh']
['middle frontal gyrus posterior rh',
 'superior frontal sulcus anterior lh',
 'superior temporal gyrus anterior medial']
['lateral occipital cortex postero-inferior rh',
 'angular gyrus middle rh',
 'cingulate sulcus posterior rh']

In [None]:
# (session00.tasks[2].tolist()+session01.tasks[2].tolist()).__len__()
# X_big = pd.concat([X00[agg00f.factor_leaders],
#                    X01[agg00f.factor_leaders]])
X_big = pd.concat([X00, X01])
y_big = np.array(session00.tasks[2].tolist()+session01.tasks[2].tolist())
X_big.shape, y_big.shape

In [None]:
np.mean([validate_model(next(LinearSVCGen()),
                        X=X_big[['insula inferior rh',
                                 'superior temporal sulcus posterior rh',
                                 'frontomarginal gyrus rh',
                                 'lateral occipital cortex postero-inferior rh',
                                 'angular gyrus middle rh',
                                 'cingulate sulcus posterior rh',
                                 'middle frontal gyrus posterior rh',
                                 'superior frontal sulcus anterior lh',
                                 'superior temporal gyrus anterior medial']],
#                                  'supramarginal gyrus postero-superior lh',
#                                  'subparietal sulcus anterior rh',
#                                  'postcentral gyrus superior lh']],
#                X01[['lateral occipital cortex postero-inferior rh',
#                     'angular gyrus middle rh',
#                     'cingulate sulcus posterior rh']],
                        y=y_big,
                        test_size=0.8,
                        stratify=y_big
                       ).accuracy.mean()
  for n in range(50)])

In [None]:
# from sklearn.metrics import explained_variance_score
# y_true = session00.tasks[2]
# linsvc = LinearSVC(max_iter=100000,
#                    class_weight='balanced')
# linsvc.fit(X00, y_true)

# RFECV(next(LinearSVCGen(**dict(max_iter=100000,
#                                class_weight='balanced'))),
#       min_features_to_select=X00.shape[1]-1,
#       importance_getter='coef_',
#       cv=StratifiedKFold(),
#       n_jobs=11).fit(X00, y_true).support_


In [None]:
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest, RFE
from sklearn.linear_model import LinearSVC
from sklearn.metrics import explained_variance_score
from typing import Iterable, Union

def RFECVPCASVCKBest(X: Iterable,
                     y: Iterable,
                     step: Union[int, float] = 1,
                     importance_getter: Unioin[str, callable] = 'coef_'
                     **kwargs
                     ) -> np.array:
    from cimaq_decoding_utils import validate_model
    linsvc = LinearSVC(max_iter=10000, class_weight='balanced')
    linsvc.fit(X, y)
    coef_table_ = pd.DataFrame(linsvc.coef_,
                               index=linsvc.classes_
                               columns=linsvc.feature_names_in_)
    explained_variance_ = 
    
SelectKBest

In [None]:
help(CM)

In [None]:
connect_mat = CM(LedoitWolf(),
                 kind='correlation',
                 discard_diagonal=True).fit_transform([X.values])[0]
connect_mat = pd.DataFrame(connect_mat).where(connect_mat!=np.tril(connect_mat), 0)
connect_mat.min()
# connect_mat.shape, sns.heatmap(connect_mat)
# pd.DataFrame()#.describe()
# connect_mat

In [None]:
steps = list(factorGenerator(X00.shape[1]))

# range(steps[0], steps[-1])
connect_mat.max()

In [None]:



agglo_defs = dict(affinity='euclidean',
                  compute_full_tree='auto',
                  linkage='ward',
                  pooling_func=np.mean,
                  distance_threshold=None,
                  compute_distances=True)

connect_mat = CM(LedoitWolf(),
                 kind='covariance').fit_transform([X.values])[0]
agg = FeatureAgglomeration(n_clusters=divmod(X00.shape[1], 4)[0],#list(factorGenerator(X00.shape[1]))[-1]-6,
                     connectivity=connect_mat,
                     **agglo_defs)

rfe = RFE(estimator=PCA(), step=1,
              n_features_to_select=1,
              importance_getter='explained_variance_ratio_')

rfe2 = RFE(estimator=PCA(), step=0.05,
              n_features_to_select=0.1,
              importance_getter='explained_variance_ratio_')

agg.fit(X00, session00.tasks[2])

sorting = pd.DataFrame(zip(agg.labels_, agg.feature_names_in_),
                       columns=['labels_', 'feature_names_in_'])
cluster_ids = list(sorting.groupby('labels_').groups.values())

cluster_leaders = np.array([rfe.fit(X=X00.iloc[:, clust],
                           y=session00.tasks[2]).get_feature_names_out()[0]
                   if len(clust) > 1 else X00.iloc[:, clust].columns[0]
                   for clust in cluster_ids])

rfe2.fit(X00[cluster_leaders], session00.tasks[2])

display(X00[rfe2.get_feature_names_out()].shape,
        X00[rfe2.get_feature_names_out()],
        validate_model(next(LinearSVCGen()),
               X00[cluster_leaders], session00.tasks[2],
               test_size=0.8,
               stratify=session00.tasks[2]).round(2))
#     AgglomerateMulticollinearCorrelated()

In [None]:
est=PCA()
est.fit(X00.corr('spearman'))
ev=est.explained_variance_
nm=est.feature_names_in_[np.argsort(ev)]
# display(tuple(zip(nm, X00.columns)))
explained_variance_ = pd.DataFrame(zip(nm, ev))
# hierarchy.ward(squareform(1-explained_variance_ratio_.corr('spearman')))
# pd.DataFrame(,
#              index=PCA().fit(X00.corr('spearman')),
#              columns=['explained_variance_ratio_']).sort_values('explained_variance_ratio_',
#                                                                 ascending=False)

In [None]:
display(X00.cov().round(2), pd.DataFrame(connect_mat,index=X00.columns,
                                         columns=X00.columns).round(2))

In [None]:
list(factorGenerator(X00.shape[1]))
from itertools import combinations

cond_unique_features_ = dict(tuple((itm, pd.Series(agglo00[itm[0]].new_features_.columns.tolist() +
    agglo00[itm[1]].new_features_.columns.tolist()).nunique())
                                   for itm in list(combinations(agglo00.keys(), 2))))
cond_unique_features_
# [len(val) for val in tuple(cond_unique_features_.values())]
# .symmetric_difference

In [None]:
# skb = SelectKBest(score_func=score_funcs.loc[score_func], k=k)
# estimator_ = RFE(estimator=skb, step=step,
#                  n_features_to_select=k,
#                  importance_getter='scores_')
from sklearn.metrics import explained_variance_score
from sklearn.decomposition import FastICA, PCA

score_funcs = pd.Series([chi2,
                         f_classif, f_regression,
                         mutual_info_classif,
                         mutual_info_regression,
                         GenericUnivariateSelect],
                        index=['chi2',
                               'f_classif', 'f_regression',
                               'mutual_info_classif',
                               'mutual_info_regression',
                               'GenericUnivariateSelect'])

pca = PCA()
skb = SelectKBest(score_func=score_funcs['mutual_info_classif'], k=10)
ica = FastICA(n_components=1, max_iter=10000,
              algorithm='parallel',
              whiten=True, fun='logcosh',
              tol=0.0001, w_init=None,
              random_state=None)


In [None]:
rfe.get_feature_names_out()

In [None]:
# clusters_by_condition = AgglomerateMulticollinearCorrelated(X)

X00task = X00.set_axis(session00.tasks[2], axis=0)
agglo00 = {}

[agglo00.update({cond: AgglomerateMulticollinearCorrelated(X=X00task.loc[cond],
                                                           y=X00task.loc[cond].index,
                                                           n_clusters=divmod(X00.shape[1], 4)[0])})
 for cond in session00.tasks[2].unique()]

pca00 = {}

[pca00.update({key: SortFeaturesByConditionPCA(X=agglo00[key].new_features_,
                                   y=agglo00[key].new_features_.index,
                                   n_features=4)})
                    for key in tuple(agglo00.keys())]

In [None]:
RFE(PCA(), )

In [None]:
pca00['Cs'][0].feature_names_in_.shape
pca00.keys()

In [None]:
# display(pca_filtered_[0][0].attributes_.shape, ,
#        pca_filtered_[0][0].feature_names_out_.shape)
display(*[pca00[key][0].components_table_#attributes_.T.sort_values('explained_variance_',
#                                                   ascending=False)
          for key in tuple(pca00.keys())])
# display(pca_filtered_[0][0].feature_names_out_.shape,
#         pca_filtered_[0][0].covariance_table_,
#         pca_filtered_[0][0].components_table_)




In [None]:
# agglo00[0].feature_names_out_==agglo00[0].feature_names_in_
display(*[
          
          agglo00[key].new_features_ for key in tuple(agglo00.keys())])

In [None]:
connect_mat = CM(LedoitWolf(),
                 kind='correlation').fit_transform([X00.values])[0]
    
# niplot.plot_connectome(session00.new_features.corr('spearman'),
#                        niplot.find_probabilistic_atlas_cut_coords(session00.new_fmri_img))

In [None]:
agglo.n_leaves_, agglo00.children_
agglo.__dict__.keys()
agglo.connectivity230,
 ('Miss', 'Cs'): 

In [None]:



method_list = ['euclidean','spearman', 'pearson']
features_by_cond = dict(tuple((method, SortFeaturesByConditionPCA(X00, session00.tasks[2],
                                               method=method,
                                               n_features = 20))
                    for method in tqdm_(method_list)))
# pca_new = PCA().fit(X00.set_axis(session00.tasks[2],axis=0).loc['Ctl'].corr('spearman'))
# display(pca_new.components_.shape, pca_new.explained_variance_.shape,
#  np.argsort(pca_new.explained_variance_),
#        )

In [None]:
features_by_cond

In [None]:
pca_test = PCA().fit(X00.corr())

In [None]:
pca_subset = list(set(flatten([v.tolist() for v in tuple(features_by_cond.values())])))
print(len(pca_subset))
validate_model(next(LinearSVCGen()),
               X=X00[pca_subset],#results01[-5]['selected'],#[col].values.reshape(-1,1),
               y=session00.tasks[2], test_size=0.8)

In [None]:
# X = maps_cdict.signal_matrix.copy(deep=True)
# y = tasks[2]




cols = '|'.join(['hippo', 'cuneus','rhinal',
                 'lateral fissure posterior limb lh',
                 'retrocalcarine cortex rh'])
results00 = RecurviseKBestLinearSVC(X=X00,
#                                     [list(set(flatten(X_subset.columns.tolist() +
#                                                            X_permutation.columns.tolist())))],
                                    y=session00.tasks[2],
#                                     half=False,
                                    score_func='mutual_info_classif',#'mutual_info_classif',
                                    step=0.1)
# results01 = RecurviseKBestPCA(X01, session01.tasks[2])

In [None]:
# results00['selected'].iloc[]
list(factorGenerator(130))

In [None]:
display(*results00['performance'], results00['selected'].iloc[-1])


In [None]:
validate_model(next(LinearSVCGen(**{"class_weight": "balanced"})),
               results00['selected'].iloc[-1],
               session00.tasks[2],
               stratify=session00.tasks[2],
               test_size=0.8).round(2)

In [None]:
tradeoff00 = dict(tuple((rez['n_features'], round(rez['performance'], 2))
                        for rez in results00))
# tradeoff01 = dict(tuple((rez['n_features'], round(rez['performance'], 2))
#                         for rez in results01))
display(tradeoff00)
# results01[-2]['selected'].columns

In [None]:
rfe = RFE(estimator=next(LinearSVCGen()),
                         step=1,
                         n_features_to_select=None,
                         importance_getter='coef_')

rfe.fit(results01[-5]['selected'],session00.tasks[2]).get_feature_names_out()

In [None]:
# pd.DataFrame([[round(validate_model(next(LinearSVCGen()),
#                X=results01[-2]['selected'][col].values.reshape(-1,1),
#                y=session01.tasks[2], test_size=0.8).accuracy.mean(), 2)
#  for col in results01[-2]['selected'].columns]
#  for n in range(5)]).mean()

validate_model(next(LinearSVCGen()),
               X=X00[pca_subset],#results01[-5]['selected'],#[col].values.reshape(-1,1),
               y=session00.tasks[2], test_size=0.8)

In [None]:
from sklearn.manifold import TSNE

tsne = TSNE(n_components=3,
            perplexity=len(not_all_conds),
            learning_rate='auto', init='pca',
            angle=0.3, n_jobs=11,
            square_distances=True)
tsne.fit(X[not_all_conds])
tsne_dims = tsne.fit_transform(X[not_all_conds])

In [None]:
tsne_dims.shape

In [None]:
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.multiclass import OutputCodeClassifier
from sklearn.feature_selection import RFECV
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier

def rfecv_forest(X, y):
    pca_per_cond = Bunch()
    for cond in tqdm_(y.unique().tolist()):
        X = X.set_axis(y, axis=0)
        for n in range(abs((X.shape[1]-1) - 50)):
            

            estimator_ = RFE(estimator=PCA(),
                               step=1,
                               n_features_to_select=50,
                               importance_getter='explained_variance_',
                               n_jobs=11)

            pca_per_cond.update({cond: dict(tuple((cond, estimator_.fit(X=X.loc[cond].values,     
                                                              y=y[y==cond]
                                                              ).get_feature_names_out())))})
            X = X[pca_per_cond[cond]]

    return pca_per_cond

best50 = rfecv_forest(X, y=tasks[2])
    # estimator_.__dict__
# estimator_.fit(X.loc['Hit'], tasks[1].)
# recursive_per_cond = Bunch(**dict(tuple((cond, estimator_.fit(X.loc[cond].values,
# #                                                               y.loc[cond].values
#                                                               ).get_support(indices=True))
#                                         for cond in tasks[1].unique())))


# sel = SequentialFeatureSelector(estimator=estimator_,
#                                 n_features_to_select=X.shape[1]-1,
#                                 scoring='explained_variance',
#                                 direction='forward',
#                                 n_jobs=11)




In [None]:
pca_per_cond['Miss'].shape

In [None]:
# pca.components_.T.shape
PCA().fit(X).transform(X)

In [None]:
(pca.explained_variance_.shape,
 pca.explained_variance_ratio_.shape,
 pca.singular_values_.shape)

In [None]:
from sklearn.decomposition import IncrementalPCA, PCA, FastICA

pca = PCA().fit(X.set_axis(tasks[2], axis=0))
#     X.set_axis(tasks[0],axis=0).loc['Ctl'])

# ica_ctl = FastICA(max_iter=2000, tol=5e-2).fit(weightings(X.set_axis(tasks[1], axis=0), linsvc.coef_table_).loc['Ctl'])

pca.components_attributes_ = pd.DataFrame(zip(pca.explained_variance_,
                                          pca.explained_variance_ratio_,
                                          pca.singular_values_),
                                          index=tasks[2],
                                          columns=['explained_variance',
                                                   'explained_variance_ratio',
                                                   'singular_values'])
pca.cov_table_ = pd.DataFrame(pca.get_covariance(),
                                  index=pca.feature_names_in_,
                                  columns=pca.feature_names_in_)

pca.components_table_ = pd.DataFrame(pca.components_,
                                         index=tasks[2].values,
                                         columns=pca.feature_names_in_)

# ica_ctl.components_.shape
# pca.components_table_.groupby(pca.components_table_.index).mean()
# pca.components_table_
pca.components_.shape, pca.explained_variance_.shape

In [None]:
# [pca.components_table_.loc['Ctl']==pca.components_table_.loc['Ctl'].max()]
# pca.components_table_[pca.components_table_==pca.components_table_.max()]
pd.DataFrame([row[1].sort_values(ascending=False)[:50]
              for row in pca.components_table_.loc['Ctl'].iterrows()])



In [None]:
ipca_ctl = IncrementalPCA()

ipca_ctl.fit(X.set_axis(tasks[0],axis=0).loc['Ctl'])
# [ipca_ctl.partial_fit(row[1].values.reshape(-1, 1))
#  for row in X.set_axis(tasks[0],axis=0).loc['Ctl'].iterrows()]

ipca_ctl.feature_names_in_ = X.columns.tolist()
# ipca_ctl.set_params(**{'feature_names_in_': X.columns.tolist()})
ipca_ctl.cov_table_ = pd.DataFrame(ipca_ctl.get_covariance(), index=ipca_ctl.feature_names_in_,
                                  columns=ipca_ctl.feature_names_in_)
ipca_ctl.components_table_ = pd.DataFrame(ipca_ctl.components_, columns=ipca_ctl.feature_names_in_)

In [None]:
sorted(dir(pca_ctl))
# pca_ctl.feature_names_in_
#==X.set_axis(tasks[0],axis=0).loc['Ctl']
# pca_ctl.cov_table_.pow(-1)
# pca_ctl.explained_variance_ratio_.shape, pca_ctl.explained_variance_.shape
# pca_ctl.score(pca_ctl.components_table_)
ipca_ctl.get_precision()
# pca_ctl.components_table_.where(np.isinf(pca_ctl.components_table_.values)).notna().sum().sum()

In [None]:
# ipca_ctl.get_params().keys()
pca_ctl.explained_variance_.shape

In [None]:
sorted(dir(linsvc))
# sorted(dict(inspect.getmembers(linsvc, callable)).keys())
# help(linsvc.score)#(tasks[2], linsvc.predict(X))
#tasks[2])

In [None]:

# display(linsvc.coef_table_[linsvc.coef_table_==linsvc.coef_table_.max()].T.head(30),
#         linsvc.coef_table_[linsvc.coef_table_==linsvc.coef_table_.min()].T.head(30),
#         linsvc.decision_table_.head(20))

In [None]:
# linsvc.coef_table_[linsvc.coef_table_==linsvc.coef_table_.min()]
linsvc.coef_table_.T[linsvc.coef_table_.T==linsvc.coef_table_.T.min()].dropna(axis=0, how='all')

In [None]:
pairwise_correlates(linsvc.coef_table_)

In [None]:
sorted(dict(inspect.getmembers(pd.DataFrame.corr)).keys())

In [None]:
from sklearn.metrics import pairwise_distances_argmin, pairwise_distances_argmin_min
from scipy.stats import spearmanr
# (linsvc.decision_table_, linsvc.coef_table_)
# help(pairwise_distances_argmin)#(X.set_axis(tasks[2], axis=0), X.set_axis(tasks[2], axis=0))
# pd.Series(pairwise_distances_argmin(X.T,X.T, metric=lambda X, y: scipy.stats.spearmanr(X, y).correlation)).nunique()

In [None]:
trimmed_corr_mat(X.set_axis(tasks[2], axis=0).loc['Ctl'],
                 method='spearman', thresh=0.9, positive=True)

trimmed_corr_mat(linsvc.coef_table_.loc['Ctl'],
                 method='spearman', thresh=0.9, positive=True)


In [None]:
pd.DataFrame(scipy.stats.spearmanr(X, X).correlation)

In [None]:
# img_data = [np.uint8(NiftiMasker().fit_transform(img)).flatten()
#             for img in tqdm_(list(nimage.iter_img(session.masker._resampled_maps_img_)))]
# maps_vectors = pd.DataFrame(pairwise_distances(np.transpose(np.array([img.get_fdata().flatten()
#                          for img in nimage.iter_img(session.masker._resampled_maps_img_)]))),
#                 index=X.columns, columns=X.columns)

In [None]:
import inspect
sorted(dict(inspect.getmembers(agglo, callable)).keys())
# max(flatten(agglo.children_.tolist()))

In [None]:
from sklearn.neighbors import NearestNeighbors
from scipy.spatial.distance import squareform
from sklearn.covariance import EmpiricalCovariance, MinCovDet
from sklearn.covariance import GraphicalLassoCV, GraphicalLasso, LedoitWolf
# nn = NearestNeighbors(n_neighbors=3)



agglo.labels_mapping_ = pd.DataFrame(tuple(zip(agglo.labels_,
                                               agglo.feature_names_in_))).groupby(0).groups
agglo.features_names_mapping_ = dict(tuple((itm[0], X.iloc[:, itm[1]].columns.tolist())
                                     for itm in tuple(agglo.labels_mapping_.items())))
agglo.feature_names_out_ = [','.join(list(set(' '.join(val).split()))) for val in
                            tuple(agglo.features_names_mapping_.values())]

agglo.features_names_mapping_
# roi_nn = nn.fit(maps_vectors).kneighbors()

# roi_nn = Bunch(**dict(tuple(zip(['neigh_dist', 'neigh_ind'],
#                                 roi_nn))))
# pd.DataFrame.gt(X.corr('spearman').mask(
#     np.triu(np.ones_like(X.corr('spearman'),
#                          dtype=bool))), 0.9).any().stack()

In [None]:

rfecv = RFECV(
    estimator=next(LinearSVCGen()),
    min_features_to_select=10,
    step=5,
    n_jobs=11,
    scoring="accuracy",
    cv=5,
)


best_clusters = [new_features[rfecv.fit(new_features, task).get_support(indices=True)]
                 for task in tasks]

display(*best_clusters)
# best_of_ = [dict(tuple((itm[0], rfecv.fit(X.iloc[:, itm[1]].values,
#                                 task).get_support(indices=True))
#              for itm in tuple(agglo.labels_mapping_.items())
#              if len(itm[1])>1))
#             for task in tasks]

# best_indices = rfecv.fit(new_features, tasks[2]).get_support(indices=True)
# new_features.iloc[:, rfecv.fit(new_features, tasks[2]).get_support(indices=True)]

In [None]:
best_of_

In [None]:
display(*[[pd.Series(dict(tuple((idx,
                                 validate_model(next(LinearSVCGen()),
                                                X=X.iloc[:, agglo.labels_mapping_[task[0]][idx]],
                                                y=task[1], test_size=0.8).accuracy.mean().round(2))
                                for idx in best_of_[task[0]]))).sort_values(ascending=False).iloc[[0]]]
          for task in enumerate(tasks)])
# trimmed_corr_mat(new_features)

In [None]:
niplot.view_img(nimage.mean_img(nimage.index_img(session.masker.maps_img, agglo.labels_mapping_[9])))

In [None]:
def find_pairwise_corrs(X: pd.DataFrame,
                        method: str = 'spearman',
                        thresh: float = 0.9,
                        positive: bool = True
                        ) -> list:
    """
    A function to identify highly correlated features.
    """
    # Compute correlation matrix with absolute values
#     matrix = X.corr(method).abs()
    matrix = X.corr(method)

    # Create a boolean mask
    mask = np.triu(np.ones_like(matrix, dtype=bool))
#     return matrix.mask(mask)
    
    # Subset the matrix
    masked_matrix = matrix.mask(mask)
    
#     # Find cols that meet the thresh
    eq_sign = next(get_corr_sign(thresh))
    to_drop = masked_matrix[eq_sign(masked_matrix, thresh)].stack().index.tolist()
    return to_drop
#     to_drop = [c for c in masked_matrix.columns if \
#               masked_matrix[eq_sign(masked_matrix[c], thresh)].any()]
    
#     return to_drop


In [None]:
def identify_correlated(X: pd.DataFrame,
                        method: str = 'spearman',
                        threshold: float = 0.9
                        ) -> list:
    """
    A function to identify highly correlated features.
    """
    # Compute correlation matrix with absolute values
#     matrix = X.corr(method).abs()
    matrix = X.corr(method).abs()

    # Create a boolean mask
    mask = np.triu(np.ones_like(matrix, dtype=bool))
    
    # Subset the matrix
    masked_matrix = matrix.mask(mask)
    
    # Find cols that meet the threshold
#     to_drop = [c for c in reduced_matrix.columns if \
#               any(reduced_matrix[c] > threshold)]
    
    return masked_matrix

def reduce_matrix(X: pd.DataFrame,
                        method: str = 'spearman',
                        threshold: float = 0.9
                        ) -> pd.DataFrame:
    masked_matrix = identify_correlated(X, method=method,
                                         threshold=threshold)
    to_drop = [c for c in masked_matrix.columns if
               any(masked_matrix[c] > threshold)]
    reduced_matrix = X.drop(to_drop, axis=1)
    return reduced_matrix

from sklearn.ensemble import RandomForestRegressor

rfecv_params = dict(step=5,
                    cv=5, n_jobs=11,
                    importance_getter='auto',
                    min_features_to_select=5,
                    scoring='accuracy')

# reduce_matrix(X).shape
# reduced_ = reduce_matrix(X)
# reduced_cv = reduced_[RFECV(next(LinearSVCGen()),
#                             **rfecv_params).fit(reduced_, tasks[2]).get_feature_names_out()]

# reduced_cv
# def iter_reduce_matrix(X: pd.DataFrame,
#                         method: str = 'spearman',
#                         threshold: float = 0.9
#                         ) -> pd.DataFrame:
    
    
# rfecv_matrices = [X[RFECV(next(LinearSVCGen()), **rfecv_params).fit(X, tasks[task[0]]).get_feature_names_out()]
#                   for task in tqdm_(enumerate(tasks))]


# print([mat.shape for mat in rfecv_matrices])

# fwd_reduced_matrices = [rfecv_matrices[task[0]]
#                         [SequentialFeatureSelector(ovrc.fit(
#                             X_new, tasks[task[0]]),
#                                                    direction='forward',
#                                                    **sfs_params).fit(
#                             session.computed_.signal_matrix,
#                             tasks[task[0]]).get_feature_names_out()]
#                         for task in tqdm_(enumerate(tasks))]

# bwd_reduced_matrices = [fwd_reduced_matrices[task[0]][SequentialFeatureSelector(
#                             next(LinearSVCGen()), direction='backward',
#                             **sfs_params).fit(fwd_reduced_matrices[task[0]],
#                                               tasks[task[0]]).get_feature_names_out()]
#                         for task in tqdm_(enumerate(tasks))]
# validate_model(next(LinearSVCGen()),
#                X=reduce_matrix(X),
#                          y=tasks[1],
#                          test_size=0.8,
#                          stratify=tasks[1],
#                          shuffle=True).round(2)
# validate_model(estimator=RandomForestRegressor(),
#                X=reduce_matrix(X), y=tasks[2], test_size=0.8)
# # Build feature/target arrays

# def validate_forest(X, y,
#                     test_size: float = 0.5,
#                     random_state: int = None
#                     ) :

#     # Generate train/test sets
#     X_train, X_test, y_train, y_test = train_test_split(X, y,
#                                                         test_size=.3,
#                                                         random_state=1121218)

# %%time
# # Init, fit, score
# forest = RandomForestRegressor()
# _ = forest.fit(X_train, y_train)

# >>> print(f"Training score: {forest.score(X_train, y_train)}")
# Training score: 0.9860728454127408

# >>> print(f"Test score: {forest.score(X_test, y_test)}")

In [None]:
def clusterize(X, y,
               method: str = 'spearnan',
               thresh: float = 0.9,
               positive: bool = True,
               n_deci: int = 32
               ) -> Bunch:
    X = X.set_axis(y, axis=0)
    eq_sign = tuple(filter(lambda x: x[0],
                           ((positive, pd.DataFrame.gt),
                            (not positive, pd.DataFrame.lt))))[0][1]

    return Bunch(**dict(tuple((cond, Bunch(**dict(
               tuple((row[0], row[1].dropna().to_dict())
                     for row in
                     X.loc[cond].corr(method).round(n_deci).where(
                         X.loc[cond].corr(method).round(n_deci) !=
                         np.triu(X.loc[cond].corr(method).round(
                             n_deci).values)).where(eq_sign(X.loc[cond].corr(
                         method).round(n_deci),
                                     thresh)).iterrows()
                     if row[1].dropna().to_dict() != {}))))
                              for cond in list(set(tuple(y))))))

In [None]:
from sklearn.manifold import TSNE

In [None]:
from sklearn.feature_selection import RFECV
from sklearn.linear_model import LinearRegression

X_train, X_test, y_train, y_test = train_test_split(
    X, tasks[2], test_size=0.5, random_state=1121218)

X_train_std, X_test_std = list(map(StandardScaler().fit_transform, [X_train, X_test]))

y_train, y_test =  y_train.values.reshape(-1, 1), y_test.values.reshape(-1, 1)

# Init, fit
rfecv = RFECV(
    estimator=next(LinearSVCGen()),
    min_features_to_select=50,
    step=5,
    n_jobs=-1,
    scoring="accuracy",
    cv=5,
)

X[rfecv.fit(X, tasks[2]).get_feature_names_out()]


# lr = next(LinearSVCGen())
# lr.fit(X_train_std, y_train)

# print("Trainign R-sqaured:", lr.score(X_train_std, y_train))
# print("Testing R-squared:",lr.score(X_test_std, y_test))

In [None]:
from cimaq_decoding_utils import chunks
from cimaq_decoding_utils import flatten
    
X = session.computed_.signal_matrix.copy(deep=True)
tasks = [session.events.trial_type,
         session.events.recognition_performance.replace({'Miss':'Fail'}),
         session.events.iloc[:, -1]]

X = X.set_axis(tasks[2], axis=0)





def PairwiseClusterNames(X,
                         method: str = 'spearman',
                         thresh: float = 0.90
                         ):

    X_corr = trimmed_corr_mat(X, method=method)
    pairs = X_corr[X_corr > thresh].stack()
    pairs.sort_values(ascending=False, inplace=True)
    return list(map(list, pairs.index.tolist())).__iter__()
        
    
def mknew_(X, y):

    return pd.Series(data=X[list(y)].T.mean().values,
                     index=X.index, name=','.join(y))


# def MakePairwise(X, method: str = 'spearman',
#                  thresh=0.90):
    
#     X = X.copy(deep=True)
#     mat = trimmed_corr_mat(X, method=method, thresh=thresh)
# #     clusters = [[str(row[0])]+
# #     pairs = list(PairwiseCorrelates(X, method, thresh))
# #     new_ = pd.concat([mknew_(X, pair) for pair in pairs], axis=1)
# #     X.drop(flatten(pairs), axis=1, inplace=True)

#     return new_
#     while PairwiseClusterNames(new_, method, thresh):
        
    

# def RecursivePairwise(X, method: str = 'spearman',
#                       thresh=0.90):
    


def PairwiseCorrelates(X, method: str = 'spearman',
                       thresh=0.90):

    X_corr = trimmed_corr_mat(X, method=method)
    corr_list = PairwiseClusterNames(X, method=method,
                                     thresh=thresh)

    yield from corr_list.__iter__()
    
# trimmed_corr_mat(MakePairwise(X))[trimmed_corr_mat(MakePairwise(X))>0.9].stack().index.tolist()\
# MakePairwise(
# [scipy.stats.spearmanr(mknew_(X, PairwiseClusterNames(X)[0]), row[1].values)
#  for row in X.iterrows()]

In [None]:
# clust = [[row[0]]+row[1].dropna().index.tolist() for row in trimmed_corr_mat(X).iterrows()]
# len(clust), len(flatten(clust)), abs(len(clust) - len(flatten(clust)))

sc = X.corr('spearman').where(X.corr('spearman').values
                         != np.tril(X.corr('spearman').values))
sc = sc.where(sc>0)

# trimmed_corr_mat(X)
sc.dropna(axis=0, how='all')

In [None]:
print(FeatureAgglomeration.__file__)

In [None]:
proximity_matrix

In [None]:
# trimmed_corr_mat(, thresh=0.99)#.filter(regex='cingu').T.filter(regex='cingu')

a=proximity_matrix[proximity_matrix!=np.tril(proximity_matrix)].filter(
    regex='lh')['retrocalcarine cortex lh'].dropna()
a['inferior temporal sulcus anterior rh'], a['inferior temporal sulcus anterior lh']
#     .filter(regex='posterior').T.filter(
#     regex='cingu').filter(regex='posterior').filter(regex='lh').T.sort_index().filter(regex='rh').sort_index()

In [None]:
from nilearn.connectome import ConnectivityMeasure
from sklearn.cluster import FeatureAgglomeration, ward_tree


pos_len = len(trimmed_corr_mat(X, positive=True,
                               thresh=0.9).stack().index.tolist())
neg_len = len(trimmed_corr_mat(X, positive=False,
                               thresh=-0.9).stack().index.tolist())
n_c = pos_len+neg_len

proximity_matrix = 1 - pd.DataFrame(pairwise_distances(np.array([img.get_fdata().flatten()
                                                               for img in
                                  nimage.iter_img(session.masker._resampled_maps_img_)])),
                                index=X.columns, columns=X.columns)

agglo = FeatureAgglomeration(n_clusters=56,
                             affinity='euclidean',
                             connectivity=proximity_matrix,
#                              connectivity=sprm_pw_neg,
                             compute_full_tree='auto',
                             linkage='ward',
                             pooling_func=np.mean,
#                              distance_threshold=None,
#                              distance_threshold=sprm_pw_neg.quantile(0.75).mean(),
                             compute_distances=True)
agglo.fit(X)

agglo.labels_mapping_ = pd.DataFrame(tuple(zip(agglo.labels_,
                                               agglo.feature_names_in_))).groupby(0).groups
agglo.features_names_mapping_ = dict(tuple((itm[0], X.iloc[:, itm[1]].columns.tolist())
                                     for itm in tuple(agglo.labels_mapping_.items())))
agglo.feature_names_out_ = [','.join(list(set(' '.join(val).split()))) for val in
                            tuple(agglo.features_names_mapping_.values())]
# w_agglo = Bunch(**dict(tuple(zip(['children', 'n_connected_components',
#                                   'n_leaves', 'parents', 'distances'],
#                                  ward_tree(X.corr('spearman'), connectivity=proximity_matrix,
#                                            n_clusters=64, return_distance=True)))))

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Feature, target arrays
# X, y = ansur.iloc[:, :-1], ansur.iloc[:, -1]
X = session.computed_.signal_matrix.copy(deep=True)
tasks = [session.events.trial_type,
         session.events.recognition_performance.replace({'Miss':'Fail'}),
         session.events.iloc[:, -1]]
y = tasks[2]
X.set_axis(y, axis=0, inplace=True)

# Train/test set generation
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=1121218
)

# Scale train and test sets with StandardScaler
X_train_std = StandardScaler().fit_transform(X_train)
X_test_std = StandardScaler().fit_transform(X_test)

# Fix the dimensions of the target array
y_train = y_train.values.reshape(-1, 1)
y_test = y_test.values.reshape(-1, 1)

# Init, fit, test Lasso Regressor
forest = RandomForestRegressor()
_ = forest.fit(X_train_std, y_train)
forest_score = forest.score(X_test_std, y_test)

print(forest_score)
importance_table = pd.DataFrame(zip(X_train.columns,
                                    abs(forest.feature_importances_)),
                                columns=["feature", "weight"]).sort_values(
                       "weight").reset_index(drop=True)

from sklearn.feature_selection import RFE

# Init the transformer
rfe = RFE(estimator=RandomForestRegressor(), n_features_to_select=10)

# Fit to the training data
_ = rfe.fit(X_train_std, y_train)

# Init, fit, score
forest = RandomForestRegressor()
_ = forest.fit(rfe.transform(X_train_std), y_train)
forest.score(rfe.transform(X_test_std), y_test)

from sklearn.feature_selection import RFECV
from sklearn.linear_model import LinearRegression

# Init, fit
rfecv = RFECV(
    estimator=LinearRegression(),
    min_features_to_select=5,
    step=5,
    n_jobs=-1,
    scoring="r2",
    cv=5,
)

_ = rfecv.fit(X_train_std, y_train)


lr = LinearRegression()
_ = lr.fit(X_train_std, y_train)

print("Trainign R-sqaured:", lr.score(X_train_std, y_train))
print("Testing R-squared:",lr.score(X_test_std, y_test))

In [None]:
agglo_params = agglo.get_params(deep=True)
X_reduced = pd.DataFrame(agglo.transform(X), index=X.index,
                         columns=agglo.feature_names_out_)
X_restored = agglo.inverse_transform(X_reduced.values)

X_reduced4 = X_reduced
X_reduced4

In [None]:
display(all(X_reduced0 == X_reduced2))

In [None]:
X_reduced3

In [None]:
min_classes = [task.value_counts()[task.value_counts()==task.value_counts(ascending=True)[0]]
 for task in tasks]

min_classes = [c for c in min_classes if c.values == 1]
min_classes

In [None]:
validate_model(next(LinearSVCGen()),
               X=X_reduced.values,
                         y=tasks[1],
                         test_size=0.8,
                         stratify=tasks[1],
                         shuffle=True).round(2)

In [None]:
sprm_pw = pd.DataFrame(pairwise_distances(X.corr('spearman')),
             columns=X.columns, index=X.columns)
sprm = X.corr('spearman')
sprm_neg = 1-X.corr('spearman')
sprm_pw_neg = pd.DataFrame(pairwise_distances(sprm_neg),
                           columns=X.columns,
                           index=X.columns)
display(sprm_pw.describe(), sprm.describe(), sprm_neg.describe(), sprm_pw_neg.describe())

In [None]:
sprm_pw_neg.quantile(0.75).mean()

In [None]:
# trimmed_corr_mat(X)[trimmed_corr_mat(X)==trimmed_corr_mat(X).max()]#.sort_values(ascending=False)
from collections import defaultdict

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import spearmanr
from scipy.cluster import hierarchy
from scipy.spatial.distance import squareform

from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
from sklearn.model_selection import train_test_split



X = session.computed_.signal_matrix.copy(deep=True)
y = tasks[2]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
print("Accuracy on test data: {:.2f}".format(clf.score(X_test, y_test)))

squareform(X)

In [None]:
result = permutation_importance(clf, X_train, y_train, n_repeats=10, random_state=42)
perm_sorted_idx = result.importances_mean.argsort()

tree_importance_sorted_idx = np.argsort(clf.feature_importances_)
tree_indices = np.arange(0, len(clf.feature_importances_)) + 0.5

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8))
ax1.barh(tree_indices, clf.feature_importances_[tree_importance_sorted_idx], height=0.7)
ax1.set_yticks(tree_indices)
ax1.set_yticklabels(data.feature_names[tree_importance_sorted_idx])
ax1.set_ylim((0, len(clf.feature_importances_)))
ax2.boxplot(
    result.importances[perm_sorted_idx].T,
    vert=False,
    labels=data.feature_names[perm_sorted_idx],
)
fig.tight_layout()
plt.show()


In [None]:
tasks = [session.events.trial_type,
         session.events.recognition_performance.replace({'Miss':'Fail'}),
         session.events.iloc[:, -1]]

def clusterize(X, y,
               method: str = 'spearnan',
               thresh: float = 0.9,
               positive: bool = True,
               n_deci: int = 32
               ) -> Bunch:
    X = X.set_axis(y, axis=0)
    eq_sign = tuple(filter(lambda x: x[0],
                           ((positive, pd.DataFrame.gt),
                            (not positive, pd.DataFrame.lt))))[0][1]

    return Bunch(**dict(tuple((cond, Bunch(**dict(
               tuple((row[0], row[1].dropna().to_dict())
                     for row in
                     X.loc[cond].corr(method).round(n_deci).where(
                         X.loc[cond].corr(method).round(n_deci) !=
                         np.triu(X.loc[cond].corr(method).round(
                             n_deci).values)).where(eq_sign(X.loc[cond].corr(
                         method).round(n_deci),
                                     thresh)).iterrows()
                     if row[1].dropna().to_dict() != {}))))
                              for cond in list(set(tuple(y))))))





rfecv_params = dict(step=0.1, cv=None, n_jobs=11,
                    min_features_to_select=1,
                    scoring='accuracy')

sfs_params = dict(n_jobs=11, n_features_to_select=0.25,
                  scoring='accuracy')


ovrc = OneVsRestClassifier(next(LinearSVCGen()), n_jobs=11)



In [None]:
def get_recursive_best(X: Iterable,
                       y: Iterable = None,
                       estimator=None,
                       rfecv_params: Union[dict, Bunch] = None,
                       ) -> Iterable:
    rfecv_defs = dict(cv=None, step=0.01,  n_jobs=11,
                      min_features_to_select=1,
                      scoring='accuracy')
    if rfecv_params is None:
        rfecv_params = {}
    rfecv_defs.update(rfecv_params)
    if estimator is None:
        estimator = next(LinearSVCGen())
#     rfecv = RFECV(estimator, **rfecv_defs)
#     rez = yield X[rfecv.fit(X, y).get_feature_names_out()]
#     for n in range(2):
#         X = rez.send(rez[rez.columns], y)
#     yie
    rfecv = RFECV(estimator, **rfecv_defs).fit(X, y)
    selected = tuple(zip(rfecv.support_, rfecv.feature_names_in_))
    rez = yield [itm[1] for itm in list(filter(lambda x: x[0], selected))]
    yield from (rez.send(X) for n in range(4))

next(get_recursive_best(X, tasks[2]))
# X[get_recursive_best(X[get_recursive_best(X[get_recursive_best(X[get_recursive_best(X[get_recursive_best(X=X, y=tasks[2])], tasks[2])],
#                    tasks[2])], tasks[2])], tasks[2])]

In [None]:
from sklearn.feature_selection import SequentialFeatureSelector, SelectFromModel

sfm = SelectFromModel(next(LinearSVCGen()),
                      threshold=-np.inf, max_features=8)

bsfs = SequentialFeatureSelector(next(LinearSVCGen()),
                                 direction='backward',
                                 scoring='accuracy',
                                 n_jobs=11)

test = bsfs.fit(X, X.index).get_feature_names_out()
test
# 

In [None]:
method='spearman'
n_deci=2

tuple(X.corr(method).round(n_deci
                          ).where(
     X.corr(method).round(n_deci) !=
     np.triu(X.corr(method).round(
         n_deci).values)).where(X.corr(
     method).round(n_deci)>thresh).iterrows()
      for row in X.corr(method).iterrows()
 if row[1].dropna().to_dict() != {})

In [None]:
n = 2
sel0, sel1 = [RFECV(next(LinearSVCGen()),
                   **rfecv_params)] * n

X0 = X[sel0.fit(X, y).get_feature_names_out()]
sel2 = RFECV(next(LinearSVCGen()),
                   **rfecv_params)
sel2.fit(X0, y)
sel2.get_feature_names_out().__len__()

In [None]:
X = session.computed_.signal_matrix
y = tasks[2]
n = 4

selectors = [RFECV(next(LinearSVCGen()),
                   **rfecv_params)] * n
X = X[selectors[0].fit(X, y).get_feature_names_out()]
# X = X[selectors[1].fit(X, y).get_feature_names_out()]
X
# for sel in selectors:
#     sel.fit(X, y)
#     outnames = sel.get_feature_names_out()
#     sel.fit(X[outnames], y)
#     print(len(sel.get_feature_names_out()))
# X

In [None]:
# rfecv = RFECV(next(LinearSVCGen()),
#               **rfecv_params).fit(X, tasks[2])

def selector(X, y):
    rfecv_params = dict(step=0.2, cv=None, n_jobs=11,
                        min_features_to_select=1,
                        scoring='accuracy')
    selectors = RFECV(next(LinearSVCGen()),
                       **rfecv_params)
    return X[sel.fit(X, y).get_feature_names_out()]

X_new = selector(X, tasks[2])
#     [X[sel.fit(X, y).get_feature_names_out()]
#      >> ]
# rfecv = (RFECV(next(LinearSVCGen()),
#               **rfecv_params).fit(X, tasks[2]))
# selected = tuple(zip(rfecv.support_, rfecv.feature_names_in_))
# X[[itm[1] for itm in
#    list(filter(lambda x: x[0], selected))]]

In [None]:
y = tasks[2]
estimator = next(LinearSVCGen())
estimator.fit(X, y)
print(estimator.__dict__.keys())

coef_table = pd.DataFrame(estimator.coef_,
                          index=estimator.classes_,
                          columns=estimator.feature_names_in_)

In [None]:
# 
renamer = dict(tuple((itm[1], itm[0])
                     for itm in enumerate(y.unique())))

In [None]:
(coef_table.T*coef_table.T.std()).corr('spearman').plot(kind='barh')

In [None]:


coef_table.min()
# scaled_corr = MinMaxScaler().fit_transform(coef_table.corr('spearman'))
# sns.heatmap(1 - scaled_corr)

In [None]:
selected = tuple(zip(rfecv.support_, rfecv.feature_names_in_))

In [None]:
[itm[1] for itm in list(filter(lambda x: x[0], selected))].__len__()

In [None]:
# rfecv_matrices = [X[RFECV(next(LinearSVCGen()), **rfecv_params).fit(X, tasks[task[0]]).get_feature_names_out()]
#                   for task in tqdm_(enumerate(tasks))]


# print([mat.shape for mat in rfecv_matrices])

# fwd_reduced_matrices = [rfecv_matrices[task[0]]
#                         [SequentialFeatureSelector(ovrc.fit(
#                             X_new, tasks[task[0]]),
#                                                    direction='forward',
#                                                    **sfs_params).fit(
#                             session.computed_.signal_matrix,
#                             tasks[task[0]]).get_feature_names_out()]
#                         for task in tqdm_(enumerate(tasks))]

# bwd_reduced_matrices = [fwd_reduced_matrices[task[0]][SequentialFeatureSelector(
#                             next(LinearSVCGen()), direction='backward',
#                             **sfs_params).fit(fwd_reduced_matrices[task[0]],
#                                               tasks[task[0]]).get_feature_names_out()]
#                         for task in tqdm_(enumerate(tasks))]

In [None]:


fwd_reduced_matrices = [rfecv_matrices[task[0]]
                        [SequentialFeatureSelector(ovrc.fit(
                            X_new, tasks[task[0]]),
                                                   direction='forward',
                                                   **sfs_params).fit(
                            session.computed_.signal_matrix,
                            tasks[task[0]]).get_feature_names_out()]
                        for task in tqdm_(enumerate(tasks))]

bwd_reduced_matrices = [fwd_reduced_matrices[task[0]][SequentialFeatureSelector(
                            next(LinearSVCGen()), direction='backward',
                            **sfs_params).fit(fwd_reduced_matrices[task[0]],
                                              tasks[task[0]]).get_feature_names_out()]
                        for task in tqdm_(enumerate(tasks))]

print([mat.shape for mat in fwd_reduced_matrices])

In [None]:
from nilearn.regions import img_to_signals_maps, signals_to_img_maps, _compute_weights


In [None]:
# # session.masker.maps_img.shape
fmri4d = signals_to_img_maps(region_signals=session.computed_.signal_matrix,
                             maps_img=session.masker._resampled_maps_img_,
                             mask_img=session.masker._resampled_mask_img_)
# session.masker.__dict__.keys()
#.maps_img.shape, session.masker.mask_img.shape


In [None]:
rfecv_matrices = [rfecv_matrices[task[0]][RFECV(next(LinearSVCGen()), **rfecv_params).fit(
                      session.computed_.signal_matrix, tasks[task[0]]).get_feature_names_out()]
                  for task in tqdm_(enumerate(tasks))]
print([mat.shape for mat in rfecv_matrices])

In [None]:
fwd_reduced_matrices = [rfecv_matrices[task[0]][SequentialFeatureSelector(
                            next(LinearSVCGen()), direction='forward',
                            **sfs_params).fit(rfecv_matrices[task[0]],
                                              tasks[task[0]]).get_feature_names_out()]
                        for task in tqdm_(enumerate(tasks))]

In [None]:




bwd_reduced_matrices = [fwd_reduced_matrices[task[0]][SequentialFeatureSelector(
                            next(LinearSVCGen()), direction='backward',
                            **sfs_params).fit(fwd_reduced_matrices[task[0]],
                                              tasks[task[0]]).get_feature_names_out()]
                        for task in tqdm_(enumerate(tasks))]
display(*bwd_reduced_matrices)

In [None]:
def make_savepath(data_dir, dimension, resolution_mm):
    dim, res = str(dimension), str(resolution_mm)+'mm'
    suffix = f'difumo_{dim}_{res}_cortex_labels.txt'
    while True:
        yield (os.path.join(data_dir, suffix))


def get_cortex_atlas(atlas):
    from nilearn.plotting import find_probabilistic_atlas_cut_coords
    not_cortex = (atlas.labels.reset_index(drop=False).set_index(
                      'difumo_names').T.filter(
                          regex='|'.join(no_cortex)).columns.tolist())
    cortex = atlas.labels.reset_index(drop=False).set_index(
                 'difumo_names').drop(list(not_cortex), axis=0)
    
    new_map = nilearn.image.index_img(atlas.maps, cortex.component)
    cortex = cortex.drop(['component'], axis=1).reset_index(drop=False)
    cortex['component'] = range(cortex.shape[0])
    cortex = cortex.set_index('component')
    cortex[['x', 'y' 'z']] = find_probabilistic_atlas_cut_coords(new_map)
    return Bunch(**dict(maps=new_map, labels=cortex))

# cortex_atlases_dir = '/data/simexp/fnadeau/nilearn_atlases/'
# cortex_atlases = [get_cortex_atlas(atlas) for atlas in atlases]
# cortex_atlases[0]

In [None]:
di1024_cortex_coords = niplot.find_probabilistic_atlas_cut_coords(cortex_atlases[0].maps)

In [None]:
smooth_roi0 = nimage.smooth_img(nimage.mean_img(nimage.index_img(cortex_atlases[0].maps, [0])), 8 )
roi_data = np.array([img.get_fdata().flatten() for img in
                     tqdm_(list(nimage.iter_img(cortex_atlases[0].maps)))])
distances = pairwise_distances(roi_data)
distances=pd.DataFrame(distances,
                       index=cortex_atlases[0].labels.difumo_names,
                       columns=cortex_atlases[0].labels.difumo_names)
distances = distances.where(distances.values!=np.tril(distances.values))
nearest_rois = pd.DataFrame(distances[distances==distances.min()].stack().index.tolist())
# neighbors = [nearest_rois.groupby(0).get_group(grp) for grp in nearest_rois.groupby(0).groups]
X = session.computed_.signal_matrix
corr_mat = X.corr('spearman')
corr_mat = corr_mat.where(corr_mat.values!=np.tril(corr_mat.values))
constants = pd.DataFrame(corr_mat[corr_mat > 0.9].stack().index.tolist())
region_clusters = pd.DataFrame([val for val in constants.values
                                if val in nearest_rois.values])
cluster_dict = dict(tuple((grp, [v for v in
                                 flatten(region_clusters.groupby(0).get_group(grp).values.tolist())
                                 if v != grp])
                          for grp in region_clusters.groupby(0).groups))
cluster_names = [flatten(item) for item in tuple(cluster_dict.items())]
cluster_ids = [cortex_atlases[0].labels.loc[[name[0] for name in
                                            enumerate(cortex_atlases[0].labels.difumo_names.tolist())
                          if name[1] in cluster]].index.tolist()
               for cluster in cluster_names]

cluster_maps = [nimage.mean_img(nimage.index_img(cortex_atlases[0].maps, cluster_id))
                for cluster_id in cluster_ids]
name_enum = enumerate(cortex_atlases[0].labels.difumo_names.tolist())
unclustered = cortex_atlases[0].labels.loc[[name[0] for name in name_enum
                                            if name[1] not in flatten(cluster_names)]]
unclustered_names, unclustered_ids = unclustered.difumo_names.tolist(), unclustered.index.tolist()

unclustered_maps = nimage.index_img(cortex_atlases[0].maps, unclustered_ids)


In [None]:

not_cortex = di1024.labels.reset_index(drop=False).set_index('difumo_names').T.filter(
                 regex='|'.join(no_cortex))
cortex = di1024.labels.reset_index(drop=False).set_index(
             'difumo_names').T.drop(not_cortex.columns, axis=1).T


In [None]:
maps_masker512 = NiftiMapsMasker(maps_img=cortex_atlases[1].maps,
                              mask_img=session.mask_img,
                              t_r=get_t_r(session.fmri_img),
                              resampling_target='mask',
                              **session.masker_defs).fit()

In [None]:
trial_type_cols=['trial_type', 'recognition_performance',
                 'ctl_miss_ws_cs']
session.glm_defs.update({'signal_scaling': False})
session.computed_ = get_all_contrasts(fmri_img=session.fmri_img,
                               events=session.events,
                               masker=session.masker,
                               ctl_id='Ctl',
                               output_type='effect_size',
                               trial_type_cols=trial_type_cols,
                               standardize=True,
                               scale=False,
                               maximize=False,
                               glm_kws=session.glm_defs,
                               design_kws=session.design_defs,
                               feature_labels=)


print(maps_masker512.signal_matrix.shape)

In [None]:
from sklearn.metrics import explained_variance_score

linear_svc = next(LinearSVCGen())
tasks = [session.events.trial_type,
         session.events.recognition_performance.replace({'Miss':'Fail'}),
         session.events.iloc[:, -1]]

display(*[validate_model(linear_svc,
                         X=session.computed_.signal_matrix,
                         y=tasks[task[0]],
                         test_size=0.8,
                         stratify=tasks[task[0]],
                         shuffle=True).round(2)
         for task in enumerate(tasks)])

In [None]:
# sorted(dir(linear_svc))
# help(linear_svc.score)
# linear_svc.predict(X)
# linear_svc.__dict__.keys()
help(explained_variance_score)

In [None]:
# help(explained_variance_score
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA

X = session.computed_.signal_matrix.copy(deep=True)
y = tasks[2]
X = X.set_axis(y, axis=0)

pca = PCA()
pca.fit(X.loc['Cs'])

pca.explained_variance_ratio_, 

var_table = pd.DataFrame(pca.components_, columns=pca.feature_names_in_)#.max().sort_values(ascending=False)
# pca.singular_values_.shape
# linear_svc.fit(X, y)

# # y_pred = cross_val_predict(linear_svc, X, y, groups=y)
# y_pred = linear_svc.predict(X)
# evs = explained_variance_score(y_true=y, y_pred=y_pred)
# evs.__dict__

In [None]:
X.var().sort_values(ascending=False)

In [None]:

def validate_classif(estimator,
                    X: Iterable, y: Iterable,
                    test_size: float = 0.8,
                    cv: Union[int, callable] = None,
                    stratify: Iterable = None,
                    random_state: int = None,
                    **kwargs
                    ) -> float:

    from sklearn.metrics import classification_report
    from sklearn.model_selection import cross_val_predict

    validation_params = dict(test_size=test_size, shuffle=True,
                             stratify=stratify, random_state=None)

    X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                        **validation_params)

#     estimator.fit(X_train, y_train)
    cv_score = cross_val_predict(estimator.fit(X_train, y_train),
                                 X_test, y_test, groups=y_test, cv=cv)
    return round((len(list(filter(None, cv_score == y_test))) /
                  len(y_test)), 2)
#     cr_test = pd.DataFrame(classification_report(y_pred=cv_score,
#                                                  y_true=y_test,
#                                                  output_dict=True,
#                                                  zero_division=0)).accuracy.mean()
    return cr_test

In [None]:
from nilearn.regions.rena_clustering import recursive_neighbor_agglomeration, ReNA
from nilearn.regions import Parcellations
# help(ReNA)

rena = Parcellations(method='rena', n_parcels=X.shape[1]/2,
                     standardize=False, smoothing_fwhm=2.,
                     scaling=True)



# rena = ReNA(session.masker._resampled_mask_img_, n_clusters=3, scaling=False,
#             n_iter=10, threshold=1e-07)

rena.fit(list(nimage.iter_img(fmri4d)))
# help(rena.fit)
# rena_clusters = recursive_neighbor_agglomeration(X.T.values,
#                                               session.mask_img, 3)
# session.masker.__dict__
# help(rena)

In [None]:
def cl_(X, method: str = 'spearman', thresh: float = 0.9):
    mat = trimmed_corr_mat(X, method=method, thresh=0.9)
    clusters = []
    for row in mat.iterrows():
        names_ = [row[0]]+[str(ind) for ind 
                           in row[1].dropna().index]
        feature_ = pd.Series(data=X[names_].T.mean().values,
                             index=X.index, name=' '.join(names_))
        mat.drop(names_, axis=1, inplace=True)
        clusters.append(feature_)
    return clusters, mat

clusters, mat = cl_(X)

In [None]:
clusters_ = yield pd.concat([mknew_(X, flatten((row[0], list(row[1].dropna().to_dict()))))
                       for row in trimmed_corr_mat(X, method=method, thresh=thresh).iterrows()], axis=1)

In [None]:
from nilearn.connectome import ConnectivityMeasure as CM
from sklearn.cluster import AgglomerativeClustering
from cimaq_decoding_utils import factorGenerator

cm = CM(kind='correlation').fit_transform([X.values])
# sorted(cm.__dict__.keys())
aggc = AgglomerativeClustering(n_clusters=None,
                               connectivity=cm[0],
                               compute_full_tree=True,
                               linkage='ward',
                               distance_threshold=-np.inf,
                               compute_distances=True).fit(X.corr('spearman'))


In [None]:
aggc.__dict__

In [None]:
clusters_ = pd.concat([mknew_(X, flatten((row[0], list(row[1].dropna().to_dict()))))
                       for row in trimmed_corr_mat(X).iterrows()], axis=1)

In [None]:
from scipy.stats import spearmanr
# help(spearmanr)




# trimmed_corr_mat(clusters_)

clusters_2 = pd.concat([mknew_(clusters_, flatten((row[0], list(row[1].dropna().to_dict()))))
                       for row in trimmed_corr_mat(clusters_).iterrows()], axis=1)

clusters_3 = pd.concat([mknew_(clusters_2, flatten((row[0], list(row[1].dropna().to_dict()))))
                       for row in trimmed_corr_mat(clusters_2).iterrows()], axis=1)

clusters_4 = pd.concat([mknew_(clusters_3, flatten((row[0], list(row[1].dropna().to_dict()))))
                       for row in trimmed_corr_mat(clusters_3).iterrows()], axis=1)
clusters_4
# X.[spearmanr(X).correlation>0.9]
# help(pairwise_distances)

In [None]:
# max([len(c) for c in PairwiseClusterNames(X)])

clusters_ = [pwc+[row[0] for row in X.T.iterrows() if
             scipy.stats.spearmanr(row[1], mknew_(X, pwc)).correlation > 0.9]
             for pwc in tqdm_(PairwiseClusterNames(X))]

In [None]:
from collections import Counter

len(tuple(dict(Counter(flatten(clusters_)).most_common()).keys()))

In [None]:
hierarchy.ward(trimmed_corr_mat(X)[trimmed_corr_mat(X)>0.9].fillna(0))

In [None]:
# sorted(dir(hierarchy))
# help(hierarchy.ClusterNode)
def make_clusters(X, thresh:float=0.9):
    idx = pd.DataFrame(trimmed_corr_mat(X, thresh=thresh).stack().index.tolist()).groupby(0).groups
    return [pd.Series(data=X[flatten([itm[0], X.iloc[:, itm[1]].columns.tolist()])].T.mean(),
                      index=X.index, name=)
             for itm in tuple(idx.items())]

make_clusters(X)
# leaves_list, leaders

In [None]:
from scipy.spatial.distance import squareform

from scipy.cluster import hierarchy

# dist_linkage = hierarchy.ward(X.corr('spearman'))
cluster_ids = hierarchy.fcluster(hierarchy.ward(trimmed_corr_mat(X)[trimmed_corr_mat(X)>0.9].fillna(0)), 1)
dict(tuple((grp,
            flatten(pd.DataFrame(tuple(zip(cluster_ids, X.columns))).groupby(0).get_group(
                grp).set_index(0).values.tolist()))
           for grp in pd.DataFrame(tuple(zip(cluster_ids, X.columns))).groupby(0).groups))
# hierarchy.fcluster(dist_linkage, 1, criterion="distance")

In [None]:
# from cimaq_decoding_pipeline import get_optimal_features
def get_optimal_features(X: [np.ndarray, pd.DataFrame]
                         ) -> list:
    """
    Return optimal features using hierarchical clustering.


    """

    from scipy.cluster import hierarchy
    from scipy.spatial.distance import squareform
    from collections import defaultdict

    if isinstance(X, pd.DataFrame):
        corr = X.corr(method='spearman').fillna(0).values
        np.fill_diagonal(corr, 1)
    else:
        corr = spearmanr(X).correlation
        # Ensure the correlation matrix is symmetric
        corr = (corr + corr.T) / 2
        np.fill_diagonal(corr, 1)
        np.nan_to_num(corr, 0)
    # Converting the correlation matrix to a distance matrix
    distance_matrix = 1 - np.abs(corr)
    # hierarchical clustering using Ward's linkage
    dist_linkage = hierarchy.ward(squareform(distance_matrix))

    cluster_ids = hierarchy.fcluster(dist_linkage, 1, criterion="distance")
    cluster_id_to_feature_ids = defaultdict(list)

    for idx, cluster_id in enumerate(cluster_ids):
        cluster_id_to_feature_ids[cluster_id].append(idx)
    selected_features = [v[0] for v in cluster_id_to_feature_ids.values()]
    return selected_features

get_optimal_features(X)

In [None]:
PairwiseClusters(X=MakePairwise(X))

In [None]:

    





def PairwiseClusters(X, method: str = 'spearman',
                     thresh=0.90):
    X = X.copy(deep=True)
    pairs = list(PairwiseCorrelates(X, method, thresh))
    new_ = pd.concat([mknew_(X, pair)
             for pair in pairs], axis=1)
    X.drop(flatten(pairs), axis=1, inplace=True)
    X = pd.concat([new_, X], axis=1)
#     new_pairs = list(PairwiseCorrelates(X_new, method, thresh))
#     new_ = pd.concat((new_, pd.concat([mknew_(X_new, pair)
#                       for pair in new_pairs], axis=1)), axis=1)
    return X

X = session.computed_.signal_matrix.copy(deep=True)
PairwiseClusters(X=PairwiseClusters(X))
#     names_ = next(pairs)
#     new_ = mknew_(X, names_)
#     X.drop(names_, axis=1, inplace=True)
    
#     names_ = list(next(pairs) for n in range(step))
#     new_ = pd.concat([mknew_(Xcp, pair) for pair in pairs], axis=1)
#     Xcp.drop(flatten(pairs), axis=1, inplace=True)
#     return pd.concat([new_, Xcp], axis=1)


# def IterativePairwiseClusters(X, method: str = 'spearman',
#                               thresh=0.90, step=1):
    
# #     gen = yield PairwiseClusters(X, method, thresh)
#     for n in tqdm_(range(step)):
#         X = PairwiseClusters(method=method, thresh=thresh)
#     return X
# 
# X = X.set_axis(tasks[2])
# step = 3
# for _ in tqdm_(range(step)):
#     X_new = PairwiseClusters(X)
#     X = PairwiseClusters(X_new)
# X
# pairs = list(PairwiseCorrelates(X_new))
# # flatten(pairs)
# # any([name in X_new.columns for name in flatten(pairs)])
# X_new.drop(flatten(pairs), axis=1)
# new_ = pd.concat([mknew_(X_new, pair) for pair in pairs], axis=1)
# new_.shape, X_new.drop(flatten(pairs), axis=1).shape
# list(PairwiseCorrelates(X_new)).__len__()

In [None]:
X_trim = trimmed_corr_mat(X)[trimmed_corr_mat(X)>0.9].dropna(
             how='all', axis=1).dropna(how='all', axis=0)
to_group = [flatten((row[1].name, row[1].dropna().round(2).index.tolist()))
            for row in X_trim.iterrows()]
new_ = pd.concat([mknew_(X, g) for g in to_group], axis=1)


# pd.concat((new_, X.drop(flatten(to_group), axis=1)), axis=1)
# # [len(g) for g in to_group]
X_trim_new = trimmed_corr_mat(new_)[trimmed_corr_mat(new_)>0.9].dropna(
             how='all', axis=1).dropna(how='all', axis=0)

to_group_new = [flatten((row[1].name, row[1].dropna().round(2).index.tolist()))
                for row in X_trim_new.iterrows()]
new_new = pd.concat([mknew_(new_, g) for g in to_group_new], axis=1)
PairwiseClusterNames(new_new)
# new_new

In [None]:
3from itertools import combinations
triplets = list(map(list, list(combinations(maps_masker512.signal_matrix.columns, 3))))
len(triplets)

In [None]:

best_roi = pd.Series(((validate_classif(next(LinearSVCGen()),
                                     X=maps_masker512.signal_matrix[col].values.reshape(-1, 1),
                                     y=tasks[2], test_size=0.4,
                                     stratify=tasks[2],
                                     shuffle=True)
                    for col in tqdm_(maps_masker512.signal_matrix.columns))),
                     index = maps_masker512.signal_matrix.columns)
best_roi[best_roi==best_roi.max()]

In [None]:
# len(pairs), pd.Series(tuple(map(tuple, pairs))).nunique()

roi0 = Counter(itm[0] for itm in
        [pd.Series(pairs).loc[pairwise_results[
            pairwise_results>0.99].index].values.tolist()][0]).most_common(1)[0][0]
roi1 = Counter(itm[1] for itm in [pd.Series(pairs).loc[pairwise_results[
    pairwise_results>0.99].index].values.tolist()][0]
        if itm[0] == 'precentral sulcus mid-inferior rh').most_common(1)[0][0]

(roi0, roi1)

In [None]:
# Initialize estimators
svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)
linsvc = LinearSVC(class_weight='balanced')
linovrc = OneVsRestClassifier(linsvc, n_jobs=11) 
ovrc = OneVsRestClassifier(svc, n_jobs=11)



# ovrc = LogisticRegression(max_iter=1000)



In [None]:
di1024 = get_difumo(1024, 3, atlases_dir)


not_cortex = di1024.labels.reset_index(drop=False).set_index('difumo_names').T.filter(
                 regex='|'.join(no_cortex))
cortex = di1024.labels.reset_index(drop=False).set_index(
             'difumo_names').T.drop(not_cortex.columns, axis=1).T

cortex_atlas = nimage.index_img(di1024.maps, cortex.component.tolist())
cortex_names = cortex.index.tolist()
Path('/data/simexp/fnadeau/cortex-difumo-693-labels.txt').write_text('\n'.join(cortex_names))

In [None]:
sub_id, ses_id, task, space = os.path.basename(session.fmri_path).split('_')[:-2]

prefix = '_'.join(os.path.basename(session.fmri_path).split('_')[:-2])

sorted(Path(masker_dir).rglob(f'{sub_id}_{ses_id}*.pickle'))

In [None]:
X = session.computed_.signal_matrix
corr_mat = X.corr('spearman')
corr_mat = corr_mat.where(corr_mat.values!=np.tril(corr_mat.values))
constants = pd.DataFrame(corr_mat[corr_mat > 0.9].stack().index.tolist())
activation_neighbors = [constants.groupby(0).get_group(grp)
                        for grp in constants.groupby(0).groups]

len(activation_neighbors), max([len(n) for n in activation_neighbors])
# pd.DataFrame(distances[distances==distances.min()].stack().index.tolist())


In [None]:
[n for n in activation_neighbors if n.shape[0] == 9][0]

In [None]:
activation_ntable = pd.concat(activation_neighbors)#.set_index(0).sort_index()
# activation_ntable[activation_ntable[0]==activation_ntable[1]]
neighbortable = pd.concat(neighbors)#.set_index(0).sort_index()
# display(activation_ntable, neighbortable)
# neighbortable.where(neighbortable[0] != neighbortable[1])

In [None]:
common_atable = activation_ntable.loc[activation_ntable.index.intersection(neighbortable.index)]
comon_ntable = neighbortable.loc[neighbortable.index.intersection(activation_ntable.index)]


In [None]:
session.computed_.signal_matrix.drop(unclustered_names, axis=1)
# (X<0).any().any()
# X.min().min()
# session.computed_.whole.contrast_img.get_fdata().min().min()

In [None]:
len(set(flatten(cluster_names)))

In [None]:
# maps_masker = NiftiMapsMasker(maps_img=cortex_atlas,
#                               mask_img=session.mask_img,
#                               t_r=get_t_r(session.fmri_img),
#                               resampling_target='mask',
#                               **session.masker_defs).fit()

In [None]:
# from nilearn.connectome import ConnectivityMeasure as CM
# from sklearn.manifold import MDS
# from  sklearn.decomposition import PCA
# from cimaq_decoding_utils import factorGenerator

# X = session.computed_.signal_matrix
# X_factors = list(factorGenerator(X.shape[1]))

# cm = CM(kind='correlation')
# connect_mat = cm.fit_transform([session.computed_.signal_matrix.values])[0]

# mds = MDS(n_components=connect_mat.shape[1],
#           metric=True,
#           max_iter=X.shape[0], eps=1e-12,
#           random_state=sample(range(10), 1)[0],
#           dissimilarity="precomputed",
#           n_jobs=11)

# mds2 = MDS(n_components=X.shape[1],
#            metric=True,
#            max_iter=X.shape[0], eps=1e-12,
#            random_state=sample(range(10), 1)[0],
#            dissimilarity="euclidean",
#            n_jobs=11)

# # X_factors
# # connect_mat.shape
# mds.fit(connect_mat)
# mds2.fit(X)

# embedding = pd.DataFrame(mds.embedding_, index=X.columns,
#                          columns=X.columns)

# dsm = pd.DataFrame(mds.dissimilarity_matrix_,
#                    index=X.columns, columns=X.columns)


# embedding2, dsm2 = mds2.embedding_, mds2.dissimilarity_matrix_


In [None]:
def double_inputs(step=3):
    _ = 0
    while _ < step:
        x = yield
        yield x * 2
    _ += 1
        
gen = double_inputs(step=4)
next(gen)
gen.send(2)
# next(gen)


In [None]:
import sklearn

[p for p in sorted(Path(os.path.dirname(pd.__file__)).rglob('*.py'))
 if re.match('yield', p.read_text())]

In [None]:
import itertools
from itertools import chain, takewhile, tee, repeat
# sorted(dir(itertools))
help(repeat)


In [None]:
def IterativePairwiseClusters(X, *,method: str = 'spearman',
                              thresh=0.90, step=1):
    rez = {}
    for _ in range(step):
        pairs = list(PairwiseCorrelates(X, method=method, thresh=thresh))
#         [rez.update(mknew_(X, pair))  for pair in pairs]
# #         new_ = pd.concat([mknew_(X, pair) for pair in pairs], axis=1)
#         X.drop(flatten(pairs), axis=1, inplace=True)
#         [rez.update(item) for item in X.iteritems()]
#     return rez
#         X = pd.concat([new_, X], axis=1)
#     return pd.DataFrame.from_dict(rez, orient='columns')

#     for _ in range(step):
        
#         X = PairwiseClusters(X, method=method, thresh=thresh)
#     return X
#     X = PairwiseClusters(X=X, method=method, thresh=thresh)
#     gen = (PairwiseClusters(X, method=method, thresh=thresh)
#            for _ in range(step))
      
    
#     X_new = PairwiseClusters(X=X, method=method, thresh=thresh)
#     next(gen)
#     (gen.send(PairwiseClusters(X_new, method=method, thresh=thresh))
#      for _ in range(step))
#     return X_new
#     for _ in tqdm_(range(step)):
#         next(gen).send(gen)
#     X = PairwiseClusters(X, method, thresh)
#     :
#         X = PairwiseClusters(X, method, thresh)
#     return X

gen = IterativePairwiseClusters(X, step=3)
# display(*tuple(gen))
# tuple(gen)
gen

# help(gen.throw)
# display(*[next(gen).shape for i in range(3)])
# gen.send(2)

In [None]:
next(gen)

In [None]:
import defer

@defer.inlineCallbacks
def doStuff():
    result = yield takesTwoSeconds()
    nextResult = yield takesTenSeconds(result * 10)
    defer.returnValue(nextResult / 10)


In [None]:
from nilearn.reporting import get_clusters_table

cluster_table = [get_clusters_table(img, 0.05, cluster_threshold=None,
                   two_sided=False, min_distance=8.0)
                 for img in nimage.iter_img(fmri4d)]

In [None]:
cluster_table = list(filter(lambda x: x.shape[0] != 0, cluster_table))

In [None]:
cluster_table[0]

In [None]:
from scipy.sparse import csgraph, coo_matrix, dia_matrix
# help(coo_matrix)
from nilearn.regions.rena_clustering import recursive_neighbor_agglomeration as rnagg
help(rnagg)

In [None]:
thresh = 0.9

X_new = PaiwiseClusters(X)
# X_corr = trimmed_corr_mat(X)


# X_drop = X_corr[X_corr > thresh].dropna(
#              how='all', axis=0).dropna(how='all', axis=1)
# # X_drop.iloc[4]
# # X_drop.loc[X_drop.T.notna().sum().sort_values(ascending=False).index[0]].dropna().index

# to_combine = [row[1].dropna().index.tolist() for row in X_drop.iterrows()]
# to_combine
# # 'lateral occipital cortex inferior lh' in X_drop.index.tolist()

In [None]:

    


#         gen = yield gen.send(gen)
#     return gen
#     yield
    
    
"""
@defer.inlineCallbacks
def doStuff():
    result = yield takesTwoSeconds()
    nextResult = yield takesTenSeconds(result * 10)
    defer.returnValue(nextResult / 10)
"""
#     clusters_ = (PairwiseClusters(X, method, thresh))
#     test = clusters.send
#     new_features_ = mknew_
#     new_feature = pd.Series(data=X[corr_list].T.mean().values,
#                             index=X.index, name=' '.join(test00))
    
#     new_features = pd.concat([pd.Series(data=X[list(pair)].T.mean().values,
#                               index=X.index, name=' '.join(pair))
#                     for pair in corr_list], axis=1)
#     yield pd.concat([new_features, X.drop(flatten(corr_list), axis=1)], axis=1) 
    
#     to_combine = (flatten(chunk) for chunk in chunks(corr_list, step))
#     new_feature = next(pd.Series(data=X[flatten(chunk)].T.mean().values,
#                                   index=X.index, name=' '.join(chunk))
#                         for chunk in chunks(corr_list, step))
#     X_new = X.drop(to_combine, axis=1)
#     to_combine = flatten(corr_list[:step])
#     new_feature = pd.Series(data=X[to_combine].T.mean().values,
#                             index=X.index, name=' '.join(to_combine))
#     X_new = X.drop(to_combine, axis=1)
#     return new_feature

# X.shape[1] - 
PairwiseClusters(PairwiseClusters(X))
# test02 = PairwiseClusters(X, thresh=0.90, step=6)
# trimmed_corr_mat(test02)[test02.iloc[:, :5].columns]

In [None]:
pairs = PairwiseClusters(X, thresh=0.90, step=3)

test01 = [mknew_(X, pair) for pair in pairs]
new_ = pd.concat(test01, axis=1)
X_tmp = X.drop(flatten(pairs), axis=1)
X_new01 = pd.concat([new_, X_tmp], axis=1)
list(PairwiseCorrelates(X_new01))
# any([col in flatten(list(PairwiseCorrelates(X_new01))) for col in new_.columns])

In [None]:
trimmed_cs = X.set_axis(tasks[2], axis=0).loc['Cs'].corr('spearman').where(
                  (X.set_axis(tasks[2], axis=0).loc['Cs'].corr('spearman').values !=
                   np.triu(X.set_axis(tasks[2], axis=0).loc['Cs'].corr('spearman').values)))

const_pos_cs = trimmed_cs[trimmed_cs > 0.9].stack().index.tolist()
const_neg_cs = trimmed_cs[trimmed_cs < -0.9].stack().index.tolist()

trimmed_ws = X.set_axis(tasks[2], axis=0).loc['Ws'].corr('spearman').where(
                 (X.set_axis(tasks[2], axis=0).loc['Ws'].corr('spearman').values
                  != np.triu(X.set_axis(tasks[2], axis=0).loc['Ws'].corr('spearman').values)))
    
const_pos_ws = trimmed_cs[trimmed_ws > 0.9].stack().index.tolist()
const_neg_ws = trimmed_cs[trimmed_ws < -0.9].stack().index.tolist()


len(const_pos_cs), len(const_neg_ws)
# [c for c in const_pos_cs if c[0] in [c2[0] for c2 in const_neg_ws]]
# ('postcentral sulcus medial', 1, 'superior temporal gyrus anterior medial rh')
# ('postcentral sulcus medial', -1, 'middle frontal gyrus mid-posterior superior rh')

# 
# const_neg_ws

In [None]:
pos00 = clusterize(X, tasks[0], 'spearman', 0.9,
                                positive=True, n_deci=2)
neg00 = clusterize(X, tasks[0], 'spearman', -0.9,
                                positive=False, n_deci=2)
pos01 = clusterize(X, tasks[1], 'spearman', 0.9,
                                positive=True, n_deci=2)
neg01 = clusterize(X, tasks[1], 'spearman', -0.9,
                                positive=False, n_deci=2)
pos02 = clusterize(X, tasks[2], 'spearman', 0.9,
                                positive=True, n_deci=2)
neg02 = clusterize(X, tasks[2], 'spearman', -0.9,
                                positive=False, n_deci=2)

In [None]:
X = session.computed_.signal_matrix.drop(constants, axis=1)


# conditions = X.index.unique()
pos_clusters = Bunch()
[pos_clusters.update(clusterize(X, task[1], 'spearman', 0.9,
                                positive=True, n_deci=2))
 for task in enumerate(tasks)]

neg_clusters = Bunch()
[neg_clusters.update(clusterize(X, task[1], 'spearman', -0.9,
                                positive=False, n_deci=2))
 for task in enumerate(tasks)]

neg_rois = flatten([flatten([itm[0] for itm in tuple(item[1].items())]
                 for item in tuple(neg_clusters[key].items()))
         for key in tuple(neg_clusters.keys())])

pos_rois = flatten([flatten([itm[0] for itm in tuple(item[1].items())]
                 for item in tuple(pos_clusters[key].items()))
         for key in tuple(pos_clusters.keys())])

all_rois = set(neg_rois+pos_rois)

len(all_rois)

In [None]:

def clusterize2ways(X, y,
                    method: str = 'spearnan',
                    thresh: float = 0.9,
                    n_deci: int = 32
                    ) -> Bunch:
                    
    pos_clusters = Bunch()
    [pos_clusters.update(clusterize(X, y,
                                    method, thresh,
                                    positive=True,
                                    n_deci=n_deci))
     for task in enumerate(tasks)]

    neg_clusters2 = Bunch()
    [neg_clusters2.update(clusterize(X[all_rois], task[1],
                                     'spearman', -0.9,
                                     positive=False, n_deci=2))
     for task in enumerate(tasks)]

    neg_rois2 = flatten([flatten([itm[0] for itm in tuple(item[1].items())]
                                 for item in tuple(neg_clusters2[key].items()))
             for key in tuple(neg_clusters2.keys())])

    pos_rois2 = flatten([flatten([itm[0] for itm in tuple(item[1].items())]
                                 for item in tuple(pos_clusters2[key].items()))
             for key in tuple(pos_clusters2.keys())])

    all_rois2 = set(neg_rois2+pos_rois2)
len(all_rois2)

In [None]:
# Initialize estimators
svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)
linsvc = LinearSVC(class_weight='balanced')
linovrc = OneVsRestClassifier(linsvc, n_jobs=11) 
ovrc = OneVsRestClassifier(svc, n_jobs=11)

In [None]:
len(set(neg_rois)), len(set(pos_rois)), len(set(all_rois))

In [None]:
display(*[validate_model(ovrc,
                         X=session.computed_.signal_matrix[all_rois],
                         y=tasks[task[0]], test_size=0.2,
                         stratify=tasks[task[0]],
                         shuffle=True)
         for task in enumerate(tasks)])

In [None]:
session.computed_.signal_matrix[all_rois].shape[1]*0.1

In [None]:
### Convert to function ###

# tasks = (session.events.trial_type,
#          session.events.recognition_performance.replace({'Miss': 'Fail'}),
#          session.events.iloc[:, -1])



In [None]:
display(*rfecv_matrices)

In [None]:
display(*bwd_reduced_matrices)

In [None]:
rfecv_matrices[1].columns==rfecv_matrices[2].columns

In [None]:
tasks = [session.events.trial_type, session.events.recognition_performance,
         session.events.iloc[:, -1]]

display(*[validate_model(ovrc,
                         X=bwd_reduced_matrices[task[0]],
                        y=tasks[task[0]],
                        test_size=0.8,
                        stratify=tasks[task[0]],
                        shuffle=True)
         for task in enumerate(tasks)])

In [None]:
tasks[-1].value_counts()

In [None]:


embeddings.shape
# help(PCA)
pca = PCA()
pca.fit(X)
pca.__dict__

# help(np.randint)


In [None]:
import numpy as np

from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection

from sklearn import manifold
from sklearn.metrics import euclidean_distances
from sklearn.decomposition import PCA

EPSILON = np.finfo(np.float32).eps

seed = np.random.RandomState(seed=3)
# X_true = seed.randint(0, 20, 2 * n_samples).astype(float)
X_true = session.computed_.signal_matrix.values
n_samples = X_true.shape[0]
# X_true = X_true.reshape((n_samples, 2))
# Center the data
X_true -= X_true.mean()

similarities = euclidean_distances(X_true)

# Add noise to the similarities
noise = np.random.rand(n_samples, n_samples)
noise = noise + noise.T
noise[np.arange(noise.shape[0]),
      np.arange(noise.shape[0])] = 0
similarities += noise

mds = manifold.MDS(n_components=2, max_iter=3000,
                   eps=1e-12, random_state=seed,
                   dissimilarity="precomputed",
                   n_jobs=1)

pos = mds.fit(similarities).embedding_

nmds = manifold.MDS(n_components=2, metric=False,
                    max_iter=3000, eps=1e-12,
                    dissimilarity="precomputed",
                    random_state=seed,
                    n_jobs=1, n_init=1)

npos = nmds.fit_transform(similarities, init=pos)

# Rescale the data
pos *= np.sqrt((X_true ** 2).sum()) / np.sqrt((pos ** 2).sum())
npos *= np.sqrt((X_true ** 2).sum()) / np.sqrt((npos ** 2).sum())

# Rotate the data
clf = PCA(n_components=2)
X_true = clf.fit_transform(X_true)

pos = clf.fit_transform(pos)

npos = clf.fit_transform(npos)

fig = plt.figure(1)
ax = plt.axes([0.0, 0.0, 1.0, 1.0])

s = 100
plt.scatter(X_true[:, 0], X_true[:, 1], color="navy",
            s=s, lw=0, label="True Position")
plt.scatter(pos[:, 0], pos[:, 1], color="turquoise", s=s, lw=0, label="MDS")
plt.scatter(npos[:, 0], npos[:, 1], color="darkorange",
            s=s, lw=0, label="NMDS")
plt.legend(scatterpoints=1, loc="best", shadow=False)

similarities = similarities.max() / (similarities + EPSILON) * 100
np.fill_diagonal(similarities, 0)
# Plot the edges
start_idx, end_idx = np.where(pos)
# a sequence of (*line0*, *line1*, *line2*), where::
#            linen = (x0, y0), (x1, y1), ... (xm, ym)
segments = [[X_true[i, :], X_true[j, :]]
            for i in range(len(pos))
            for j in range(len(pos))
            ]
values = np.abs(similarities)
lc = LineCollection(segments, zorder=0, cmap=plt.cm.Blues,
                    norm=plt.Normalize(0, values.max()))

lc.set_array(similarities.flatten())
lc.set_linewidths(np.full(len(segments), 0.5))
ax.add_collection(lc)

plt.show()


In [None]:
X = session.computed_.signal_matrix
X.values.reshape(X.shape[0], 2)

In [None]:
# divmod(session.fmri_img.shape[-1], 2)
# session.computed_.keys()
# session.masker.__dict__.keys()


In [None]:
# # sns.heatmap(
# submap = session.masker._resampled_maps_img_
# submap.shape



In [None]:
# sub_cut_coords = niplot.find_probabilistic_atlas_cut_coords(submap)
# connectome_plot = niplot.plot_connectome(connect_mat[0], sub_cut_coords)


In [None]:
cpath = '/home/fnadeau/.linuxbrew/Cellar/isl/0.24/include/isl/hmap_templ.c'

# print(Path(cpath).read_bytes().splitlines()[2:])
help(list)

In [None]:
# session.masker._resampled_mask_img_.shape, session.masker._resampled_maps_img_.shape
# # session.mask_path
# 693/7
# 
# session.computed_.keys()
# inv_test = session.masker.inverse_transform(session.computed_.signal_matrix)

In [None]:
from nilearn.regions import Parcellations

parcells = Parcellations('ward', n_parcels=231, random_state=0,
                         mask=session.mask_img,
                         smoothing_fwhm=None, standardize=False, detrend=False,
                         low_pass=None, high_pass=None, t_r=session.t_r,
                         target_affine=session.masker._resampled_maps_img_.affine,
                         target_shape=session.mask_img.shape,
                         mask_strategy='whole-brain-template',
                         scaling=False, n_iter=10, n_jobs=11)
parcells.fit(inv_test)

In [None]:
# sorted(dir(parcells))
# help(parcells)
sig_test = parcells.transform(inv_test)

In [None]:
niplot.view_img(nimage.mean_img(parcells.components_img_))

In [None]:
!python3 -m pip install -U matplotlib

In [None]:
import inspect
dict(inspect.getmembers(parcells))

In [None]:
pd.DataFrame(connect_mat[0],
             index=session.computed_.signal_matrix.columns,
             columns=session.computed_.signal_matrix.columns).round(1)==session.computed_.signal_matrix.corr().round(1)

In [None]:
cm_test = CM(kind='correlation', discard_diagonal=True).fit([session.computed_.signal_matrix.values]).inverse_transform([session.computed_.signal_matrix.values])

In [None]:
testb = CM(kind='correlation').fit([cm_test[0]]).inverse_transform([cm_test[0]])

In [None]:
help(CM)

In [None]:
X = session.computed_.signal_matrix
dsm0 = X.corr('spearman')
dsm1 = pairwise_distances(X)
dsm2 = pairwise_distances(dsm0)

# pd.DataFrame(dsm0)


In [None]:
# ROI-Based Classification


# Initialize estimators
svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)
linsvc = LinearSVC(class_weight='balanced')
linovrc = OneVsRestClassifier(linsvc, n_jobs=11) 
ovrc = OneVsRestClassifier(svc, n_jobs=11)

validation_params = dict(test_size=0.4, shuffle=True,
                         random_state=None)

X_train, X_test, y_train, y_test = train_test_split(session.computed_.signal_matrix.T.values,
                                                    session.computed_.signal_matrix.columns,
#                                                     session.computed_.signal_matrix.in.tolist(),
                                                    **validation_params)
logreg.fit(X_train, y_train)

# len(list(filter(None, ovrc.predict(X_test)==y_test)))/len(y_test)
logreg.predict(X_test) == y_test

In [None]:
from nilearn.regions import Parcellations, ReNA

"""
score(self, imgs, confounds=None, per_component=False)
     Score function based on explained variance on imgs.
"""

# rena = ReNA(mask_img=session.mask_img,
#             n_clusters=divmod(session.fmri_img.shape[-1], 2)[0],
#             scaling=False, n_iter=10, threshold=1e-07)

# new_clusters = 
# rena.fit(session.computed_.signal_matrix.corr('spearman'))
# parcels = Parcellations(method='ward', n_parcels=divmod(session.computed_.signal_matrix.shape[1], 2)[0],
#                         smoothing_fwhm=None, mask=session.masker._resampled_mask_img_,
#                         mask_strategy='gm-template')
# parcels.fit(session.masker._resampled_maps_img_)

In [None]:
parcels.__dict__.keys()

In [None]:
def get_pairwise_contrasts(fmai_img,
                           events=None,
                           design_matrices=None
                           ) -> Bunch:
    list(itertools.combinations(x, 2))

In [None]:
import cbptools

In [None]:
# logreg.__dict__['coef_'].shape
display(pd.DataFrame(logreg.predict_log_proba(X), columns=logreg.classes_,
             index=tasks[2]).describe(),
        pd.DataFrame(logreg.predict_proba(X), columns=logreg.classes_,
             index=tasks[2]),
        pd.DataFrame(logreg.decision_function(X),
                     index=tasks[2], columns=logreg.classes_))

In [None]:
[row[1].mean() for row in session.computed_.signal_matrix.iterrows()].__len__()
from sklearn.metrics import explained_variance_score
# help(explained_variance_score)
# from sklearn.decomposition import PCA
# help(PCA)
ovrc

In [None]:
# sns.heatmap(session.computed_.signal_matrix.T.corr('spearman'))

# 

# sns.heatmap(session.computed_.signal_matrix.set_axis(
#     tasks[0], axis=0).sort_index().T.set_axis(tasks[2],axis=0).sort_index()

sns.heatmap(bwd_reduced_matrices[2].set_axis(tasks[1],axis=0).sort_index().corr('spearman'))



In [None]:
# parcels.variance_.shape
# niplot.view_img(parcels.labels_img_)
# help(parcels.connectivity_)
# parcels.nifti_maps_masker_
test = parcels.inverse_transform(session.computed_.signal_matrix)

In [None]:
# test.shape
# parcels.components_img_.shape


In [None]:
display(*[validate_model(ovrc, rfecv_matrices[task[0]], tasks[task[0]])
          for task in enumerate(tasks)])

In [None]:
display(*rfecv_matrices)

In [None]:
# session.computed_.signal_matrix.corr('spearman')
from collections import Counter

def clusterize(X: Union[np.ndarray, pd.DataFrame],
               thresh: float = 0.,
               ) -> pd.DataFrame:
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    corrmat = X.corr('spearman').where(X.corr('spearman').values !=
                                       np.tril(X.corr('spearman').values))

    cluster_names = dict([itm for itm in [(row[0], row[1].where(
                        row[1].values >= thresh).dropna().index.tolist())
                                for row in corrmat.iterrows()]
                          if itm[1] != []])
#     return [pd.Series(itm[1]).value_counts(ascending=False)]
    return cluster_names

# [session.computed_.signal_matrix[]
clusters = clusterize(session.computed_.signal_matrix)
clusters
# cluster_counts = [pd.Series(data=item[1], name=item[0]).value_counts(ascending=False)
#                   for item in tuple(clusters.items())]

In [None]:
any([itm.__contains__('superior occipital gyrus inferior rh') for itm in tuple(clusters.items())])

In [None]:
X = session.computed_.signal_matrix

# Spearman Rank Correlation
Xcorr = X.corr('spearman')
# Remove null or near-zero variance features
Xcorr.where(Xcorr.var().between(-0.05, 0.05)).dropna(axis=0, how='all')

# Xcorr.where(Xcorr.corr('spearman').values !=
#                                 np.tril(Xcorr.corr('spearman').values))

# sns.heatmap(pairwise_distances(Xcorr.fillna(0)))

In [None]:



from nilearn.regions import Parcellations, ReNA, RegionExtractor


# cluster_lists = [cortex.loc[[itm[0]]+[itm[1]]] for itm in cluster_list]

from matplotlib.pyplot import figure

figure(figsize=(16, 12), dpi=80)

from cimaq_decoding_utils import flatten
cortex['component']=range(cortex.shape[0])

cluster_maps = [nimage.index_img(cortex_atlas,
                        cortex.loc[flatten(cluster)].component.tolist())
                for cluster in cluster_list]
# niplot.plot_prob_atlas(cl03, display_mode='mosaic')
# plt.show()
# help(nilearn.regions.ReNA)

In [None]:
niplot.plot_roi(nimage.index_img(session.masker.maps_img, [1]))

In [None]:


X = session.computed_.signal_matrix
y = session.events.iloc[:, -1]

from sklearn.feature_selection import SequentialFeatureSelector

def classify_kbest_clustering(X: Union[np.ndarray, pd.DataFrame],
                              y: Iterable = None,
                              thresh: float = 0.75,
                              metric: str = 'spearman',
                              maps_img=cortex_atlas,
                              labels=cortex,
                              dst: Union[str, PathLike,
                                         PosixPath] = None,
                              ) -> dict:

    from sklearn.feature_selection import SelectKBest
    from sklearn.metrics import pairwise_distances
    from scipy.spatial.distance import correlation
    from scipy.stats import spearmanr
    from cimaq_decoding_utils import flatten

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    cluster_names = clusterize(X, thresh)
    
    if dst is not None:
        with open(dst, mode='w') as jfile:
            json.dump(cluster_names, jfile)
            jfile.close()

    cluster_names_ = [flatten(item) for item in
                     tuple(cluster_names.items())]

    cluster_means_ = pd.concat([X[clust].mean(axis=1)
                                for clust in cluster_names_],
                               axis=1).values
    
    cluster_means = pd.DataFrame(cluster_means_,
                                 index=X.index,
                                 columns=[clust[0] for clust
                                          in cluster_names_])    

    unclustered = X.copy(deep=True)
    unclustered.drop(flatten(cluster_names_),
                     axis=1, inplace=True)        

    new_signals = pd.concat([cluster_means, unclustered], axis=1)

    
    new_atlas = cluster_maps_labels(maps_img=maps_img, labels=labels,
                                    cluster_names=cluster_names_)

    return {'cluster_matrix': new_signals,
            'cluster_maps': new_atlas,
            'cluster_names': cluster_names}



In [None]:
def cluster_maps_labels(maps_img: Nifti1Image,
                        labels: pd.DataFrame,
                        cluster_names: list,
                        ) -> Bunch:

    from nilearn.image import concat_imgs, index_img, mean_img
    from cimaq_decoding_utils import flatten

    no_cluster = [label for label in labels
                  if label not in flatten(cluster_names)]

    no_cluster_idx = [itm[0] for itm in enumerate(labels)
                      if itm[1] in no_cluster]
    no_cluster_names = no_cluster
    no_cluster_maps = [nimage.index_img(maps_img, idx)
                       for idx in no_cluster_idx]
    no_cluster_values = list(zip(no_cluster_idx,
                                 no_cluster_names,
                                 no_cluster_maps))

    cluster_idx = [itm for itm in range(labels.shape[0])
                   if itm not in no_cluster_idx]
    cluster_names_ = [itm[0] for itm in cluster_names]
    cluster_maps = [mean_img(index_img(maps_img, clust_idx))
                    for clust_idx in cluster_idx]
    cluster_values = list(zip(cluster_idx, cluster_names_,
                              cluster_maps))

    whole_values = no_cluster_values + cluster_values
    whole_values = sorted(whole_values)

    new_labels = pd.DataFrame((((itm[0], itm[1]))
                               for itm in whole_values))
    new_maps_img = concat_imgs([itm[2] for itm in whole_values])

    return Bunch(**dict(maps=new_maps_img, labels=new_labels))


In [None]:
session.update(**classify_kbest_clustering(X))

In [None]:
bwd_reduced_matrices

In [None]:
cluster_masker = NiftiMapsMasker(maps_img=session.cluster_maps.maps,
                                mask_img=session.mask_img,
                                resampling_target='mask',
                                **session.masker_defs).fit()

In [None]:
cluster_fmri = cluster_masker.inverse_transform(session.cluster_matrix)

In [None]:
# from sklearn.pipeline import Pipeline
# from sklearn.feature_selection import SequentialFeatureSelector

# svc = SVC(kernel='linear',
#           cache_size=5,
#           decision_function_shape='ovr',
#           class_weight='balanced',
#           probability=True)
# linsvc = LinearSVC(class_weight='balanced')
# linovrc = OneVsRestClassifier(linsvc, n_jobs=11) 
# ovrc = OneVsRestClassifier(svc, n_jobs=11)

# rfecv_params = dict(estimator=ovrc, step=1,
#                     cv=None, n_jobs=11,
#                     importance_getter='auto',
#                     min_features_to_select=1,
#                     scoring='accuracy')

# # nftselect = lambda n: next((n+1 if n==1 else)) 
# mds = Pipeline([('rfecv', RFECV(**rfecv_params)),
#                 ('sfsf', SequentialFeatureSelector(ovrc, n_jobs=11,
#                                                    n_features_to_select=0.80,
#                                                    scoring='accuracy',
#                                                    direction='forward')),
#                 ('sfsb', SequentialFeatureSelector(ovrc, n_jobs=11,
#                                                    n_features_to_select=0.80,
#                                                    scoring='accuracy',
#                                                    direction='backward'))])

# pipeline_matrices = [session.cluster_matrix[mds.fit(session.cluster_matrix,
#                                                     tasks[task[0]]).get_feature_names_out()]
#                      for task in tqdm_(enumerate(tasks))]

In [None]:
from sklearn.feature_selection import SequentialFeatureSelector

# Initialize estimators
svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)
linsvc = LinearSVC(class_weight='balanced')
linovrc = OneVsRestClassifier(linsvc, n_jobs=11) 
ovrc = OneVsRestClassifier(svc, n_jobs=11)

rfecv_params = dict(step=1,
                    cv=None, n_jobs=11,
                    importance_getter='auto',
                    min_features_to_select=4,
                    scoring='accuracy')
sfs_params = dict(n_jobs=11, n_features_to_select=0.75)

In [None]:
redux_matrix = [session.cluster_matrix[SequentialFeatureSelector(
                    ovrc, direction='backward', **sfs_params).fit(
                        session.cluster_matrix[SequentialFeatureSelector(
                            ovrc, direction='forward', **sfs_params).fit(
                                session.cluster_matrix[RFECV(
                                    ovrc, **rfecv_params).fit(
                                         session.cluster_matrix,
                                    tasks[task[0]]).get_feature_names_out()],
                            tasks[task[0]]).get_feature_names_out()],
                tasks[task[0]]).get_feature_names_out()]
                for task in tqdm_(enumerate(tasks))]

In [None]:
display(*redux_matrix)

In [None]:
# session.cluster_maps.maps.shape
# tuple(session.computed_.keys())
base_contrasts = [session.computed_['whole']['contrast_img'],
                  session.computed_['trial_type']['contrast_img'],
                  session.computed_['recognition_performance']['contrast_img'],
                  session.computed_['ctl_miss_ws_cs']['contrast_img']]
[img.shape for img in base_contrasts]

# [type(session.computed_[key]) for key in tuple(session.computed_.keys())]
# [(key, session.computed_[key]['contrast_img']) for key in tuple(session.computed_.keys())[:3]
#  if 'contrast_img' in tuple(session.computed_[key].keys())
#  and isinstance(session.computed_[key], dict)]

In [None]:
display([mat.shape for mat in rfecv_matrices],
        [mat.shape for mat in fwd_reduced_matrices],
        [mat.shape for mat in bwd_reduced_matrices])

In [None]:
# import inspect
# import sklearn

# SVCGen = get_class(sklearn.svm,'SVC', cls_kws=dict(kernel='linear',
#                                      cache_size=5,
#                                     decision_function_shape='ovr',
#                                     class_weight='balanced',
#                                     probability=True))

# help(SVCGen.send)

In [None]:
# # next(get_class('','OneVsRestClassifier'))
# # # sorted(dir(sklearn.multiclass))
# def get_class(module_name, class_name, cls_kws=None):
#     if cls_kws is None:
#         cls_kws = {}
#     yield dict(inspect.getmembers(module_name))[class_name](**cls_kws)
# # 
# def gen_svc():
#     while True:
#         yield next(get_class(sklearn.svm,'SVC', cls_kws=dict(kernel='linear',
#                                      cache_size=5,
#                                     decision_function_shape='ovr',
#                                     class_weight='balanced',
#                                     probability=True)))
# next(gen_svc())


In [None]:
[mat.shape for mat in fwd_reduced_matrices]

In [None]:
bwd_reduced_matrices = [fwd_reduced_matrices[task[0]][SequentialFeatureSelector(ovrc, n_jobs=11,
                                                                                direction='backward').fit(
                            fwd_reduced_matrices[task[0]], tasks[task[0]]).get_feature_names_out()]
                        for task in tqdm_(enumerate(tasks))]

In [None]:
display(*fwd_reduced_matrices)

In [None]:
sp_cor = pairwise_distances(X.corr('spearman'))
pd.DataFrame(sp_cor)
tests = [X.corr('spearman'),
        (1-X.corr('spearman')),
        pd.DataFrame(pairwise_distances(X.T),
                     index=X.columns,columns=X.columns),
        pd.DataFrame(pairwise_distances(X.corr('spearman')),
                     index=X.columns,columns=X.columns),
        pd.DataFrame(1-pairwise_distances(X.corr('spearman')),
                     index=X.columns,columns=X.columns)]

cluster_names_test = [dict([itm for itm in [(row[0], row[1].where(
                    row[1].values>=thresh).dropna().index.tolist())
                            for row in test.iterrows()]
                      if itm[1] != []])
                 for test in tests]

In [None]:
no_cluster_maps = list(zip(cortex.loc[no_cluster].component.values,
                           nimage.index_img(cortex_atlas,
                                            cortex.loc[no_cluster].component.values)))
# cortex.shape

In [None]:
alt_maps_masker = NiftiMapsMasker(maps_img=cortex_atlas3,
                                  mask_img=session.mask_img,
                                  t_r=session.t_r,
                                  resampling_target='mask',
                                  **session.masker_defs).fit()

In [None]:
alt_brain_map = alt_maps_masker.inverse_transform(newsignals)

In [None]:

no_cluster_maps = list(in)
cluster_maps = [nimage.mean_img(nimage.index_img(cortex_atlas, cortex.loc[cluster].component))
                for cluster in cluster_names]


whole_idx = no_cluster_idx+list(zip(cluster_maps_idx, cluster_maps))
whole_idx = sorted(whole_idx)
newsignals = pd.concat([pd.concat([X[clust].mean(axis=1) for clust in
                        cluster_names], axis=1),
                         X.copy(deep=True).drop(flatten(cluster_names), axis=1)],axis=1)
newsignals

In [None]:
new_maps_masker = NiftiMapsMasker(maps_img=)

In [None]:
def selkbest_multi(k, x, y):
    yield from (selkbest(k, X, task[1]) for task in enumerate(y))


def get_kbest_multilabels(estimator, k, X, y, thresh=0.95, nbootstrap=10):
    cond = lambda e, k, mod, y, thresh: (validate_model(e, mod, y).accuracy <= thresh).all()
    while True:
        rfe = RFE(estimator, n_features_to_select=k)
        
list(get_kbest_multilabels(ovrc, 4, X, tasks, thresh=0.75, nbootstrap=100))
# [x[0] for x in scores if x[1] == max(list(x[1] for x in scores))][0]

In [None]:
def intgen(estimator, k, cond: callable = lambda e,x,y: validate_model(e, X, y).accuracy.all() >= thresh):
    X_new = SelectKBest
    while cond(estimator, X, y):
        print(cond.accuracy.mean())
        k +=1
        
        continue
        break
intgen(ovrc, 0)   

In [None]:
k = 2
y = tasks
estimator = ovrc
thresh = 0.95
condi = lambda e, X, y, t: validate_model(e, X, y) <= t
models = ([validate_model(selkbest(k, X, task[1]))
           for task in enumerate(y)]
          for k in tqdm_(range(k, X.shape[1])))
# X_new = next(models)
# while condi(estimator, X_new, y, ):
#     continue
    
# while validate_model(estimator, X, y).accuracy.all() <= thresh:
#     next()

In [None]:
# result = [(X.iloc[:, SelectKBest(k=k).fit(X, task[1]).get_support(indices=True)]
#            for k in range(k, X.shape[1]))
#           for task in enumerate(y)]
# scores = [validate_model(estimator, next(result[task[0]]), task[1]) <= thresh
#         for task in enumerate(y)]
# while [score.any()<=thresh for score in scores]:
#     k+=1
#     print(k)
#     continue
#     break
    

    
def ordertest(A):
    for i in range( len(A) - 1 ):
        if A[i] < A[i+1]:
            return i
        return True

    
def val_gen(estimator, k, X, y):
    scores = [[]*len(y)]
    nk = []
    for task in enumerate(y):
        validity = (validate_model(estimator,
                                    next(selkbest(k, X, y[task[0]])),
                                    y[task[0]]).accuracy.mean()
                    for k in range(k, X.shape[1]))
    
        for idx in range(1, X.shape[1]):
            scores.append(next(validity))
            if k > 2:
                if scores[task[0]][idx-1]<scores[idx]:
                    continue
                else:
                    nk.append(k)

    return nk
#     scores.append(next(val_gen))
#     if 
#     for task in enumerate(y):
        
#     if [v < thresh for v in validity]:
#         k+=1
#         continue
#     else:
#         return validity
    
tst=val_gen(ovrc, 2, X, tasks)

In [None]:
def score_gen(estimator, k, X, y, thresh):

    scores = [val_gen(estimator, k=k, X=X, y=task[1], thresh=thresh)
              for task in tqdm_(enumerate(y))]
    return scores


# conseq_decrease = lambda l: [l[i] <= l[i+1] for i in range(len(l))]

conseq_decrease(list(score_gen(estimator=ovrc, k=2, X=X, y=tasks, thresh=0.9)))

In [None]:
def kb(estimator, k, X, y, thresh=0.9):
    k=k
    acc = [validate_model(estimator=estimator,
                                    X=next(selkbest(k, X, task[1])),
                          y=task[1]).accuracy.mean()
           for task in tqdm_(enumerate(y))]
    while  np.mean(acc) < thresh:
        k+=1
        continue

        break
    scores.append(k)
    return scores
    
    
#             <= thresh).any():
#         for k in tqdm_(range(k, X.shape[1])):
#             if cond.any():
#             k+=1
#             continue
#         else:
#             return k

kb(estimator=ovrc, k=2, X=X, y=tasks, thresh=0.9)

In [None]:
from sklearn.feature_selection import SelectKBest
from cimaq_decoding_utils import flatten

svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)

def selkbest(k, X, y):
    yield X.iloc[:, SelectKBest(k=k).fit(X, y).get_support(indices=True)]
    
def get_kbest(estimator, k, X, y, thresh=0.95, nbootstrap=10):
    
    cond = lambda e, k, mod, y, thresh: (validate_model(e, mod, y).accuracy <= thresh).all()
    scores = []
    for nboot in tqdm_(range(nbootstrap)):
        while True:
            mod = next(selkbest(k, X, y))
            k += 1
            if not cond(estimator, k, mod, y, thresh):
                break
        scores.append((mod, validate_model(estimator, mod, y).accuracy.mean()))
    common_shape = Counter(list(x[0].shape[1] for x in scores)).most_common(1)[0][0]
    return common_shape
#     scores = [x for x in scores if x[0].shape[1] == common_shape]
#     return scores
#     return X.iloc[:, RFE(estimator).fit([x[0] for x in scores
#             if x[1] == max(x[1] for x in scores)][0], y).get_support(indices=True)]
#     feature_names = flatten([mod.columns.tolist() for mod in scores])
#     
    
#     return X[[x[0] for x in Counter(feature_names).most_common(min_shape+1)]]
#     return [score for score in scores if
#             score.shape == min(list(x.shape for x in scores))][0]

def get_kbest_bootstrap(estimator, k, X, y, thresh=0.95, nboot=10):
    from collections import Counter
    nk = Counter((get_kbest(estimator, k, X, y, thresh)
                    for n in tqdm_(range(nboot)))).most_common(1)[0][0]
    return X.iloc[:, SelectKBest(k=nk).fit(X, y).get_support(indices=True)]
#     cond = (validate_model(estimator, next(selkbest(k, X, y)), y).accuracy <=0.95).all()
#     if cond:
#         k += 1
#     else:
#         return k
get_kbest(svc, k=2, X=session.computed_.signal_matrix,
                    y=session.events.iloc[:, -1],#recognition_performance,#iloc[:, -1],#recognition_performance,
                    thresh=0.90, nbootstrap=100)
#           nboot=50)
    

In [None]:

#     if cv_kws is None:
#         cv_kws = {}
#     if cv is not None:
#         cv = cv(**cv_kws)

#         X_00 = min_err(X, y)
#         skb = SelectKBest(k=divmod(X_00.shape[1], 2)[0]).fit(X_00, y)
#         _X_new_ = X.iloc[:, skb.get_support(indices=True)]

#         X_new = X.iloc[:, rfe.get_support(indices=True)]

        
#         rfecv = RFECV(estimator, step=step,
#                       cv=cv, n_jobs=n_jobs,
#                       importance_getter='auto',
#                       min_features_to_select=min_features_to_select,
#                       scoring=scoring).fit(X.copy(deep=True), task[1])

#     return X.iloc[:, rfecv.get_support(indices=True)]

#     rfe = RFE(estimator, step=1,
#               importance_getter='auto',
#               n_features_to_select=0.5).fit(X_best, y)
#     return X_best.iloc[:, rfe.get_support(indices=True)]
# Initialize estimators
svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)
# sdgc = SGDClassifier(class_weight='balanced', n_jobs=11)
linsvc = LinearSVC(class_weight='balanced')
linovrc = OneVsRestClassifier(linsvc, n_jobs=11) 
ovrc = OneVsRestClassifier(svc, n_jobs=11)

recursive_best = [get_optimal_features_recursive(ovrc, kbest_by_task[task[0]],
                                                 task[1], step=1, n_jobs=11)
                  for task in enumerate(tasks)]

display(*recursive_best)

In [None]:
recursive_names = list(set(flatten(list(rbest.columns.tolist() for rbest in recursive_best))))
recursive_idx = cortex.loc[recursive_names].component.values.tolist()
recursive_maps = nimage.index_img(cortex_atlas, recursive_idx)
# recursive_maps.shape

new_trial_imgs = maps_masker.fit_transform(signal_matrix, sample_mask=recursive_idx)

In [None]:
tst00 = [maps_masker.inverse_transform(rbest) for rbest in recursive_best]

In [None]:
# X_cluter2 = [classify_kbest_clustering(recursive_best[task[0]], task[1])
#              for task in enumerate(tasks)]
[x.columns for x in recursive_best]
rbest0, rbest1, rbest2 = recursive_best
intersect = (rbest0.columns.intersection(rbest1.columns).tolist() +
             rbest1.columns.intersection(rbest2.columns).tolist() +
             rbest0.columns.intersection(rbest2.columns).tolist())

rcopy = [rbest.copy(deep=True) for rbest in recursive_best]

[[rbest.drop(col, axis=1, inplace=True)
  for col in intersect
  if col in rbest.columns]
 for rbest in rcopy]

display(*rcopy)

In [None]:
display(*[validate_model(ovrc, rcopy[task[0]], task[1])
          for task in enumerate(tasks)])

In [None]:
# from builtins import FutureWaning, UserWarning
from cimaq_decoding_utils import validate_model
from sklearn.linear_model import SGDClassifier
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)
# warnings.filterwarnings(action='ignore', category=FutureWaning,
#                         module='sklearn.utils')
# Set y values (label) for each classification task
tasks = (session.events.trial_type,
         session.events.recognition_performance,
         session.events.iloc[:, -1])

# Initialize estimators
svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)
sdgc = SGDClassifier(class_weight='balanced', n_jobs=11)
linsvc = LinearSVC(class_weight='balanced')
linovrc = OneVsRestClassifier(linsvc, n_jobs=11) 
ovrc = OneVsRestClassifier(svc, n_jobs=11)

# Run cross-validation for each classification task
validation_results = [validate_model(ovrc,
                                     X=session.computed_.signal_matrix.values,
                                     y=task.values,
                                     test_size=0.2,
                                     cv=StratifiedKFold(shuffle=True))
                      for task in tasks]


test_opt = [get_optimal_features_recursive(ovrc,
                                           X=session.computed_.signal_matrix,
                                           y=task,
                                           min_features_to_select=1,
                                           n_jobs=11,
                                           scoring='accuracy',
                                           step=1,
                                           cv=StratifiedKFold(shuffle=True))
            for task in tasks]

# Run cross-validation for each classification task
validation_results_new = [validate_model(ovrc,
                                         X=test_opt[task[0]].values,
                                         y=task[1].values,
                                         test_size=0.2,
                                         cv=StratifiedKFold(shuffle=True))
                          for task in enumerate(tasks)]

# Show classification results
display(*test_opt, *validation_results, *validation_results_new)


In [None]:
# (validate_model(svc, session.computed_.signal_matrix, session.events.trial_type).accuracy ==1).all()
# np.array([['a','b',['c', 'd']]]).flatten()
# from collections import Counter
# session.computed_.signal_matrix
# display(*kbest_by_task)
display(*[validate_model(ovrc, kbest_by_task[task[0]], task[1])
          for task in enumerate(tasks)])
# [x[0] for x in Counter(list('asdfghhjj')).most_common(5)]

In [None]:
tst=[session.computed_.signal_matrix.iloc[:, SelectKBest(k=20).fit(session.computed_.signal_matrix,
                                                            task).get_support(indices=True)]
     for task in tasks]
display(*[validate_model(ovrc, X=tst[task[0]], y=tasks[task[0]]) for task in enumerate(tst)])

In [None]:
validate_model(svc, X=get_kbest(svc, k=1, X=session.computed_.signal_matrix,
                    y=session.events.iloc[:, -1],#recognition_performance,
                    thresh=0.91, nbootstrap=400),
               y=session.events.iloc[:, -1])

In [None]:
# Counter([(itm.to_dict(),) for itm in session.computed_.matrices])
# (session.computed_.matrices[0],)

In [None]:
# cortex.T.filter(regex='pedunc|chiasm|corticospinal').loc['component'].values
# 
# list(factorGenerator(693))
# divmod(693, 2)

In [None]:
# [name for name in cortex_names if 'tract' in name]

In [None]:
# di1024 = get_difumo(1024, 3, '/data/simexp/fnadeau/nilearn_atlases/difumo_atlases/')
# v03 = get_fmri_sessions(topdir=fmriprep_dir, events_dir=events_dir, ses_id="V03")
# sessions = [fetch_fmriprep_session(session=sess) for sess in tqdm_(v03)]
# from cimaq_decoding_utils import save_masker
# from builtins import FutureWarning
# import warnings
# warnings.filterwarnings(action='ignore', category=FutureWarning)
# [(sess.update({'masker': NiftiMapsMasker(maps_img=di1024.maps,
#                                         mask_img=sess.mask_img,
#                                         resampling_target='mask',
#                                         t_r=get_t_r(sess.fmri_img),
#                                         **sess.masker_defs).fit()}),
# save_masker(masker_dir, session=sess)) for sess in tqdm_(sessions)]


In [None]:
# from IPython.display import Audio
# sound_file = './sound/beep.wav'


In [None]:
from sklearn.model_selection import train_test_split
from cimaq_decoding_utils import get_masker, save_masker

In [None]:
# masker_prefix = '_'.join([session.sub_id, session.ses_id,
#                           f'task-{session.task}',
#                           f'space-{session.space}'])
# next(Path(masker_dir).rglob(f'*{masker_prefix}*.pickle'))

In [None]:
from sklearn.preprocessing import MaxAbsScaler

tasks = (session.events.trial_type,
         session.events.recognition_performance,
         session.events.iloc[:, -1])

validation_results = [validate_model(ovrc,
                                     X=session.computed_.regressed_matrix.values,
                                     y=task.values,
                                     test_size=0.2)
#                                      cv=StratifiedKFold())
                      for task in tasks]
display(*validation_results)

In [None]:
# session.computed_.matrices[2].where(session.computed_.matrices[0].values<=0).isna().sum()
# len(np.array(tasks[0]).shape)
import sklearn
sorted(sklearn.metrics.SCORERS.keys())

# sklearn.metrics.SCORERS['accuracy']
# help(sklearn.metrics.SCORERS['top_k_accuracy'])

In [None]:
# session.computed_.signal_matrix.where(session.computed_.signal_matrix.values<=0).isna().sum()

In [None]:
# display(*[weight.where(weight.values<=0).isna().sum() for weight in session.computed_.weights])

In [None]:
# display(pd.concat([mat.reset_index(drop=False).iloc[:, 0:3] for mat in session.computed_.matrices], axis=1))
# display(session.computed_.matrices[0],
#         session.computed_.signal_matrix)
#         session.computed_.weights)
#         session.computed_.signal_matrix)
help(warnings.filterwarnings)

In [None]:
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_selection import SelectFromModel
from sklearn.feature_selection import RFECV, RFE
from sklearn.preprocessing import Binarizer
from sklearn.inspection import permutation_importance
from sklearn.model_selection import RepeatedKFold
from sklearn.ensemble import RandomForestRegressor
# warnings.filterwarnings(action='ignore', category=UserWarning,
#                         module='sklearn.model_selection')
# warnings.filterwarnings(action='ignore', category=FutureWarning,
#                         module='sklearn.utils')
# y = session.events.iloc[:, -1]

tasks = (session.events.trial_type,
         session.events.recognition_performance,
         session.events.iloc[:, -1])

linsvc = LinearSVC(class_weight='balanced')

svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)

ovrc = OneVsRestClassifier(linsvc, n_jobs=11)

# .fit(X=X, y=task)
# from cimaq_decoding_utils import factorGenerator



In [None]:
# display(*test_opt)
# display(*[np.sum(np.mean(mat.loc['Ctl'].values)) for mat in session.computed_.weighted_matrices[1:]])
ctl_weight = pd.concat([mat.loc[['Ctl']] for mat in session.computed_.weights]).mean()
X_test = session.computed_.signal_matrix.set_axis(session.events.iloc[:, -1], axis=0)

X_test = pd.DataFrame(((((row[1].values - ctl_weight)) if row[0]!='Ctl'
               else row[1].values for row in X_test.iterrows())),
             index=X_test.index, columns=X_test.columns)
# X_test.where(X_test.loc['Ctl'].index=='Ctl')
# X_test.where(X_test.loc['Ctl'].index=='Ctl')

#                                         session.computed_.signal_matrix - ctl_weight)
# X_test
# ctl_weight.shape, session.computed_.signal_matrix.shape
# session.computed_.signal_matrix.shape[1]/4
# [idx for idx in enumerate(weigh.index)
#  if idx[1] == 'Ctl']

session.computed_.matrices[1]

In [None]:
# from builtins import FutureWarning
# import warnings
# warnings.filterwarnings(action='ignore', category=FutureWarning)

# from sklearn.feature_selection import RFECV, RFE
# from sklearn.preprocessing import Binarizer
# from sklearn.inspection import permutation_importance
# from sklearn.model_selection import RepeatedKFold
# from sklearn.ensemble import RandomForestRegressor

# X = session.computed_.signal_matrix
# # y = session.events.iloc[:, -1]

# # tasks = (session.events.trial_type,
# #          session.events.recognition_performance,
# #          session.events.iloc[:, -1])

# # .fit(X=X, y=task)

# rec = RFECV(ovrc, step=3,
#             min_features_to_select=2,
#             scoring='accuracy')
# X_best = [X.iloc[:, rec.fit(X, y).get_support(indices=True)]
#           for y in tqdm_(tasks, desc='Recursively searching optimal features')]


In [None]:
# display(*X_best)

In [None]:
best_of = Bunch(**dict(tuple((task[1].name, test_opt[task[0]])
                             for task in enumerate(tasks))))
session.computed_.best_rois = best_of

In [None]:
# from cimaq_decoding_utils import flatten
cortex2 = cortex.reset_index().reset_index().set_index('difumo_names')
# di1024.labels = di1024.labels.reset_index().reset_index().set_index('difumo_names')
# cortex.where(cortex.index.tolist() ==
#              flatten([x.columns.tolist() for x in X_best])).reset_index()
# cortex.loc[flatten([x.columns.tolist() for x in X_best])]

In [None]:
trial_atlases = [nimage.index_img(cortex_atlas,
                                  cortex2.loc[tst.columns]['index'].values.tolist())
                 for tst in tqdm_(test_opt)]

trial_atlases_dict = Bunch(**dict(tuple((task[1].name, trial_atlases[task[0]])
                                        for task in enumerate(tasks))))
session.computed_.trial_best_features_imgs = trial_atlases_dict

In [None]:
wholebrain_trial_imgs = Bunch(**dict(tuple((f'Condition Type {trial[0]}', Bunch(**dict(
                            tuple((cond, maps_masker.inverse_transform(
                                session.computed_.signal_matrix.set_axis(
                                    tasks[trial[0]].values, axis=0).loc[cond].values)))
                                  for cond in tqdm_(np.unique(tasks[trial[0]].values)))))
                               for trial in tqdm_(enumerate(trial_atlases)))))
session.computed_.wholebrain_trial_imgs = wholebrain_trial_imgs

In [None]:
wholebrain_ctl_imgs = Bunch(**dict(tuple((key, wholebrain_trial_imgs[key].Ctl)
                                         for key in tuple(wholebrain_trial_imgs.keys()))))
ctl_id = 'Ctl'

[wholebrain_trial_imgs[key].pop(ctl_id)
 for key in tuple(wholebrain_trial_imgs.keys())]
session.computed_.wholebrain_trial_imgs = wholebrain_trial_imgs
session.computed_.wholebrain_ctl_imgs = wholebrain_ctl_imgs

In [None]:
best_rois_trial_imgs = Bunch(**dict(tuple((f'Condition Type {trial[0]}', Bunch(**dict(
                 tuple((cond, NiftiMapsMasker(maps_img=trial_atlases[trial[0]],
                                              mask_img=session.mask_img,
                                              resampling_target='mask',
                                              t_r=get_t_r(session.fmri_img),
                                              **session.masker_defs).fit().inverse_transform(
                         test_opt[trial[0]].set_index(tasks[trial[0]].values).loc[cond].values)))
                       for cond in tqdm_(np.unique(tasks[trial[0]].values)))))
                               for trial in tqdm_(enumerate(trial_atlases)))))

best_rois_ctl_imgs = Bunch(**dict(tuple((key, best_rois_trial_imgs[key].Ctl)
                                         for key in tuple(best_rois_trial_imgs.keys()))))
ctl_id = 'Ctl'

[best_rois_trial_imgs[key].pop(ctl_id)
 for key in tuple(best_rois_trial_imgs.keys())]
session.computed_.best_rois_trial_imgs = best_rois_trial_imgs
session.computed_.best_rois_ctl_imgs = best_rois_ctl_imgs


In [None]:
session.computed_.best_rois_trial_imgs['Condition Type 1']

In [None]:
from cimaq_decoding_utils import flatten
condition_imgs = flatten([[session.computed_.best_rois_trial_imgs[key[1]][cond]
                           for cond in tuple(session.computed_.best_rois_trial_imgs[key[1]].keys())]
                          for key in enumerate(tuple(session.computed_.best_rois_trial_imgs.keys()))])

condition_keys = flatten([[cond for cond in tuple(session.computed_.best_rois_trial_imgs[key[1]].keys())]
                          for key in enumerate(tuple(session.computed_.best_rois_trial_imgs.keys()))])


best_roi_condition_imgs = Bunch(**dict(tuple(zip(condition_keys, condition_imgs))))
session.computed_.best_roi_condition_imgs = best_roi_condition_imgs

In [None]:
display(*[niplot.view_img(nimage.mean_img(session.computed_.best_roi_condition_imgs[key[1]]),
                          title=f'Task Condition {key[1]}',
#                           display_type='mosaic',
                          dim=-1, opacity=0.8,
                          bg_img=session.anat_img)
         for key in enumerate(tuple(session.computed_.best_roi_condition_imgs.keys()))])

In [None]:
[img.shape for img in tuple(session.computed_.wholebrain_ctl_imgs.values())]

In [None]:
display(*[niplot.view_img(nimage.mean_img(session.computed_.wholebrain_ctl_imgs[key[1]]),
                          title=f'Control Condition {key[0]}',
#                           display_type='mosaic',
                          dim=-1, opacity=0.8,
                          bg_img=session.anat_img)
         for key in enumerate(tuple(session.computed_.wholebrain_ctl_imgs.keys()))])

In [None]:
display(*[niplot.view_img(nimage.mean_img(session.computed_.best_rois_ctl_imgs[key[1]]),
                          title=f'Control Condition {key[0]}',
#                           display_type='mosaic',
                          dim=-1, opacity=0.8,
                          bg_img=session.anat_img)
         for key in enumerate(tuple(session.computed_.best_rois_ctl_imgs.keys()))])

In [None]:
control_trial_imgs = [trial_imgs[key].Ctl for key in tuple(trial_imgs.keys())]
[ctl_img.shape for ctl_img in control_trial_imgs], len(control_trial_imgs)

In [None]:
"""
dict_keys(['whole', 'trial_type', 'recognition_performance',
           'ctl_miss_ws_cs', 'matrices', 'weights',
           'weighted_matrices', 'signal_matrix'])
"""

In [None]:
# display(*[mat.iloc[:, 1:4] for mat in session.computed_.matrices])
display(*session.computed_.weights)

In [None]:
wholebrain_trial_imgs

In [None]:
wholebrain_trial_imgs.ctl_trial_imgs = wholebrain_ctl_imgs

In [None]:
wholebrain_trial_imgs['Condition Type 1']

In [None]:
enc_minus_ctl = \
    NiftiMapsMasker(maps_img=cortex_atlas,
                    mask_img=session.mask_img,
                    resampling_target='mask',
                    t_r=get_t_r(session.fmri_img),
                    **masker_defs).fit().inverse_transform(NiftiMapsMasker(maps_img=cortex_atlas,
                    mask_img=session.mask_img,
                    resampling_target='mask',
                    t_r=get_t_r(session.fmri_img),
                    **masker_defs).fit().transform_single_imgs(nimage.mean_img(wholebrain_trial_imgs['Condition Type 0'].Enc)) -
     NiftiMapsMasker(maps_img=cortex_atlas,
                    mask_img=session.mask_img,
                    resampling_target='mask',
                    t_r=get_t_r(session.fmri_img),
                    **masker_defs).fit().transform_single_imgs(nimage.mean_img(wholebrain_trial_imgs['Condition Type 0'].Ctl)))


In [None]:
display(niplot.view_img(enc_minus_ctl, title=f'Encoding Minus Control',
                        dim=-1, opacity=0.8, bg_img=session.anat_img))

In [None]:
display(*[niplot.view_img(nimage.mean_img(trial_imgs[key[1]].Ctl),
                          title=f'Control Condition {key[0]}',
                          display_type='mosaic',
                          dim=-1, opacity=0.8,
                          bg_img=session.anat_img)
         for key in enumerate(tuple(trial_imgs.keys()))])

In [None]:
display(*[niplot.view_img(nimage.mean_img(trial_imgs[key[1]].Ctl),
                          title=f'Control Condition {key[0]}',
                          display_type='mosaic',
                          dim=-1, opacity=0.8,
                          bg_img=session.anat_img)
         for key in enumerate(tuple(trial_imgs.keys()))])

In [None]:
control_mean_img = nimage.mean_img(nimage.concat_imgs([trial_imgs[key].Ctl
                                                      for key in tuple(trial_imgs.keys())]))
control_img = nimage.concat_imgs([trial_imgs[key].Ctl
                                  for key in tuple(trial_imgs.keys())])

niplot.view_img(nimage.mean_img(control_img),
                opacity=0.8, 
                title='Control Trial Activity',
                dim=-1, display_mode='mosaic',
                bg_img=session.anat_img)


In [None]:
# display(*X_best)
# sorted(dir(niplot))
# help(niplot.plot_prob_atlas)
control_img.shape

In [None]:
mean_trial_imgs = flatten([[nimage.mean_img(trial_imgs[key][trial])
                            for trial in trial_imgs[key] if trial != 'Ctl']
                   for key in tuple(trial_imgs.keys())])

trial_keys = flatten([[f'{key}_{trial}' for trial in trial_imgs[key]
              if trial != 'Ctl']
              for key in tuple(trial_imgs.keys())])
mean_trials = dict(tuple(zip(trial_keys, mean_trial_imgs)))

display(*[niplot.view_img(mean_trials[key],
                          title=key, dim=-1, opacity=0.8,
                          bg_img=session.anat_img)
         for key in tuple(mean_trials.keys())])

In [None]:
# from sklearn.feature_selection import SelectKBest, SelectPercentile
# from sklearn.feature_selection import SelectFpr, SelectFdr, SelectKBest
# from sklearn.feature_selection import VarianceThreshold, SelectFwe
# ########################################################################
# # from sklearn.feature_selection import GenericUnivariateSelect

# X, y = session.computed_.signal_matrix, session.events.trial_type.tolist()


# def select_low_error_features(X: Iterable, y: Iterable,
#                               score_func: Union[callable,
#                                                 str] = 'f_classif',
#                               percentile: int = None,
#                               alpha: float = 5e-2,
#                               k: int = None,
#                               **kwargs
#                               ) -> np.ndarray:
#     """
#     Return the common features from different error correction methods.
    
    
#     The methods used control for FDR, FPR and FWE.
#     """

#     from inspect import getmembers
#     from sklearn import feature_selection
#     if isinstance(score_func, str):
#         score_func = dict(getmembers(feature_selection))[score_func]

#     sfpr = SelectFpr(score_func, alpha=alpha)
#     sfdr = SelectFdr(score_func, alpha=alpha)
#     sfwe = SelectFwe(score_func, alpha=alpha)
#     opt = []
#     ests = (sfpr, sfdr, sfwe)
#     if isinstance(X, np.ndarray):
#         X = pd.DataFrame(X)
#     [est.fit(X=X, y=y) for est in ests]
#     [opt.extend(est.get_support(indices=True).tolist()) for est in ests]
#     opt = list(set(opt))
#     X_new = X.iloc[:, opt]
# #     return X_new

#     if percentile is not None:
#         meta = SelectPercentile(score_func, percentile)
#     if k is not None:
#         meta = SelectKBest(score_func=score_func, k=k)
#     opt_new = meta.fit(X_new, y).get_support(indices=True).tolist()
#     if isinstance(X, np.ndarray):
#         return X_new.iloc[:, opt_new].values
#     else:
#         return X_new.iloc[:, opt_new]

#     # gus.fit(X, y).get_feature_names_out().shape
# #     sfpr.fit(X=X, y=y), sfdr.fit(X=X, y=y), sfwe.fit(X=X, y=y)
# #     best_idx, best_names = [], []
# # #     if hasattr(sfpr, 'feature_names_in_'):
# #     [best_names.extend(est.get_feature_names_out().tolist())
# #      for est in ests]
# #     best_names = list(set(flatten(best_names)))
# #     return np.argsort(best_names), best_names

# #     return X.iloc[:, np.argsort(best_names)], X[best_names]
# #     X_new = sp.fit(X[best_names], y).get_feature_names_out()
# #     return X_new

# # #     else:
# #     [best_idx.extend(np.argsort(est.transform(X)).tolist())
# #      for est in (sfpr, sfdr, sfwe)]
# #     best_idx = list(set(flatten(best_idx)))
# #     X_new = sp.fit(pd.DataFrame(X).iloc[best_idx],
# #                    y).transform(X[best_idx])
# #     return X_new
    
# #     if K is not None:
# #         
# #         X_new = 
                    

# a = select_low_error_features(X, y, k=2)
# display(a)


In [None]:
# # help(RFECV)
# # y = Binarizer().fit_transform(session.events.iloc[:, -1])

# def get_score_after_permutation(model, X, y, curr_feat):
#     """ return the score of model when curr_feat is permuted """

#     X_permuted = X.copy()
#     col_idx = list(X.columns).index(curr_feat)
#     # permute one column
#     X_permuted.iloc[:, col_idx] = np.random.permutation(
#         X_permuted[curr_feat].values)

#     permuted_score = model.score(X_permuted, y)
#     return permuted_score


# def get_feature_importance(model, X, y, curr_feat):
#     """ compare the score when curr_feat is permuted """

#     baseline_score_train = model.score(X, y)
#     permuted_score_train = get_score_after_permutation(model, X, y, curr_feat)

#     # feature importance is the difference between the two scores
#     feature_importance = baseline_score_train - permuted_score_train
#     return feature_importance


# # curr_feat = 'MedInc'

# # feature_importance = get_feature_importance(model, X_train, y_train, curr_feat)
# # print(f'feature importance of "{curr_feat}" on train set is'
# #       f'{feature_importance:.3}')


In [None]:
# # nk = 2
# estimator = ovrc
# X = session.computed_.signal_matrix
# y = session.events.iloc[:, -1]
# metric = 'precision'

# def kbest_iter(estimator,
#                X: Iterable, y: Iterable,
#                metric: str = 'precision',
#                nk: int = 2,
#                test_size: float = 0.8,
#                nbootstrap: int = 20,
#                cv: Union[int, callable] = None,
#                stratify: Iterable = None,
#                random_state: int = None,
#                **kwargs
#                ) -> Iterable:
#     data = []
#     validation_params = dict(test_size=test_size, cv=cv,
#                              stratify=stratify,
#                              random_state=random_state)    
# #     while True:
# #         new = SelectKBest(k=nk).fit(X, y).get_support(indices=True)
# #     for n in range(nbootstrap):
#     for n in tqdm_(range(nbootstrap)):
#         new = SelectKBest(k=nk).fit(X, y).get_support(indices=True)
#         while (validate_model(estimator,
#                               X.iloc[:, new], y,
#                                        **validation_params).T.loc[list(set(y))][metric] <
#                validate_model(estimator, X, y,
#                              **validation_params).T.loc[list(set(y))][metric]).any():
#             nk += 1
#             continue
#         else:
#             data.append(np.array((len(new), new)))
#     return (data)


#     yield from data.__iter__() 
#         yield from (SelectKBest(k=nk).fit(X, y).get_support(indices=True)
#                     for n in range(nbootstrap))
#         score = validate_model(estimator, X, y,
#                                **validation_params).T.loc[list(set(y))][metric]
#         new_score = validate_model(estimator, X.iloc[
#                         SelectKBest(k=nk).fit(X, y).get_support(indices=True):, new], y,
#                                    **validation_params).T.loc[list(set(y))][metric]
#         if (new_score < score).any():
#         nk += 1
#         continue
#     else:
#         yield SelectKBest(k=nk).fit(X, y).get_support(indices=True)
#     else:
#         break
#     return X.iloc[:, SelectKBest(k=nk).fit(X, y).get_support(indices=True)]

# test = [kbest_iter(ovrc, X.set_axis(y, axis=0), y,
#                      nk=2, nbootstrap=10)
#         for y in (session.events.trial_type,
#                   session.events.recognition_performance,
#                   session.events.iloc[:, -1])]

#     (X_new for nboot in nbootstrap)
#kbest_iter(estimator, X, y, metric, nk, test_size, cv,stratify, random_state, nbootstrap)
#
# def kbest_iter_bootstrap(estimator,
#                          X: Iterable, y: Iterable,
#                          metric: str = 'precision',
#                          nk: int = 2,
#                          test_size: float = 0.8,
#                          cv: Union[int, callable] = None,
#                          stratify: Iterable = None,
#                          random_state: int = None,
#                          nbootstrap: int = 20,
#                          **kwargs
#                          ) -> Iterable:
    
#     validation_params = dict(test_size=test_size, cv=cv,
#                              stratify=stratify,
#                              random_state=random_state)
# #     @consumer
#     yield from (kbest_iter(estimator, X, y, metric, nk, **validation_params)
#                 for n in tqdm_(range(nbootstrap)))
#     @consumer
#     yield from (pd.DataFrame(kbest_iter(estimator, X, y, metric, nk, **validation_params))
#                 for n in tqdm_(range(nbootstrap)))
#     data = pd.DataFrame(((next(kbest_iter(estimator, X, y, metric, nk,
#                                      **validation_params))
#                           for n in tqdm_(range(nbootstrap)))))
#     most_comm = data[0].value_counts(ascending=False).index.tolist()[0]
#     comm_features = list(set(flatten(idx.tolist() for idx in
#                                      data.set_index(0).loc[most_comm][1].tolist())))
#     better = SelectKBest(k=most_comm).fit(X.iloc[:, comm_features],
#                                           y).get_support(indices=True)
#     return X.iloc[:, better]



# Xnew = flatten(kbest_iter_bootstrap(ovrc, X.set_axis(y, axis=0), y,
#                      nk=2, nbootstrap=10))
# bootstrap_test = [kbest_iter_bootstrap(ovrc, X.set_axis(y, axis=0), y,
#                                        nk=4,
#                                        nbootstrap=20)
#                   for y in (session.events.trial_type,
#                             session.events.recognition_performance,
#                             session.events.iloc[:, -1])]
# display(*bootstrap_test)
# test = [kbest_iter(estimator, X.set_axis(trials, axis=0), y=trials, nk=2)
#         for trials in (session.events.trial_type, session.events.recognition_performance,
#                        session.events.iloc[:, -1])]
#  for nk in range(2, 14)


In [None]:
help(niplot.view_img)

In [None]:
from collections import Counter

svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)


ovrc = OneVsRestClassifier(svc)

def validate_conditions(estimator, X, y, k=2, metric='precision'):
    return validate_model(estimator, X, y, test_sixe=0.8).T.loc[pd.Series(y).unique()][metric]

def init_vs_new(estimator, X, y, k, metric='precision'):
    skb = SelectKBest(k=k)
    X_new = X.iloc[:, skb.fit(X, y).get_support(indices=True)]
    init_scores = validate_conditions(estimator, X, y, metric)
    k_scores = validate_conditions(estimator, X_new, y, metric)
    return (init_scores, k_scores)

# def validate_kbest(estimator, X, y, k=2, metric='precision'):
#     init_scores, k_scores = bool_kbest(estimator, X, y, k, metric)
#     if (init_scores >= k_scores).any():
#         k+=1
#         init_scores, k_scores = bool_kbest(estimator, X, y, k, metric)
#     else:
#         return X_new
# validate_kbest(ovrc, session.computed_.signal_matrix, session.events.trial_type)

In [None]:
from itertools import takewhile

def kbest_per_condition(estimator, X, y, k=2):
#     skb = SelectKBest(k=k)
#     conds = [events[col].tolist() for col in trial_type_cols]
#     k_cond = [k]*len(trial_type_cols)
    
    X_new = X.iloc[:, SelectKBest(k=k).fit(X, y).get_support(indices=True).tolist()]
    takewhile((validate_conditions(estimator, X, y) <
               validate_conditions(estimator, X_new, y)).any(),
              list(range(X.shape[1])))
#         k += 1
#     else:
    return X_new
        
        
#     return init_scores
#     init_scores = [validate_model(estimator, X=X.iloc[:, SelectKBest(k=k,
#                              y=cond, test_size=0.9,
#                              cv=min(Counter(cond).values())
#                              ).T.loc[list(set(cond))].precision
#                    for cond in conds]
#     cond_names = [list(set(cond)) for cond in conds]
#     scores = [validate_model(estimator,
#                              X=X.iloc[:, SelectKBest(k=k_cond[cond[0]]).fit(X, cond[1]).get_support(
#                                              indices=True).tolist()],
#                              y=cond[1], test_size=0.9, stratify=cond[1],
#                              cv=min(Counter(cond[1]).values())
#                                       ).T.loc[list(set(cond[1]))].precision
#              for cond in enumerate(conds)]
#     return X_new
kbest_per_condition(ovrc, X=session.computed_.signal_matrix,
                    y=session.events.recognition_performance)
#                     trial_type_cols=trial_type_cols)

In [None]:
# Initialize estimator
import pingouin as pg
svc = SVC(kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)


ovrc = OneVsRestClassifier(svc)
# ovrc.predict(session.computed_.signal_matrix)
X= session.computed_.signal_matrix
k = 2

X_new = SelectKBest(k=2).fit
result = [validate_model(estimator=ovrc,
                         X=a,
                         y=session.events[col], cv=2,
                         test_size=0.9,
                         stratify=session.events[col],
                         random_state=1).T
          for col in trial_type_cols]
# while 
display(*result)
# best_of = list(set(flatten([SelectKBest(k=2).fit(session.computed_.signal_matrix,
#                                                  session.events[col].tolist()
#                                                 ).get_support(indices=True).tolist()
#                             for col in trial_type_cols])))

# display(X.iloc[:, best_of])


In [None]:
trial_type_cols = ['trial_type', 'recognition_performance', session.events.iloc[:, -1].name]

best_rois = Bunch(**dict(tuple((trial_type_col, [col for col in tqdm_(X.columns) if
                (validate_model(ovrc, X[col].values.reshape(-1, 1),
                               session.events[trial_type_col]).accuracy>0.95).all()])
                       for trial_type_col in trial_type_cols)))

In [None]:
# best_rois = Bunch(**best_rois)

[len(best_rois[key]) for key in list(best_rois.keys())]

In [None]:
a = (pd.DataFrame(MinMaxScaler().fit_transform(StandardScaler().fit_transform(pairwise_distances(X))),
                  index=session.events.recognition_performance,
                  columns=session.events.recognition_performance).corr('spearman').sort_index().T.sort_index())
sns.heatmap(a)
# b = a.copy(deep=True)

# a.corrwith(method='spearman', )

In [None]:
validate_model(X.iloc[SelectKBest().fit(X, )])

In [None]:
# X.filter(regex='hippo')
sns.heatmap(pd.DataFrame(pd.DataFrame(pairwise_distances(X.T)).corr('spearman').values,
             index=grouped_clusters.cluster_ids,
             columns=grouped_clusters.cluster_ids).sort_index().T.sort_index())

#.set_axis(grouped_clusters.cluster_ids, axis=1)))

In [None]:
# help(X.corr)

crit = 1
Xcorr = 1 - np.abs(X.corr('spearman', min_periods=X.shape[0]).values)
np.fill_diagonal(Xcorr, 0)
dist_linkage = hierarchy.ward(squareform(Xcorr))
cluster_ids = hierarchy.fcluster(dist_linkage, crit, criterion="distance")
cluster_ids.shape, np.unique(cluster_ids).shape

grouped_clusters = pd.DataFrame(tuple(zip(cluster_ids, X.columns)),
                                columns=['cluster_ids', 'rois'])#.groupby('cluster_ids')

# clusters = [grouped_clusters.get_group(grp).rois.values for grp in grouped_clusters.groups]

# cluster_ids
# validate_model(ovrc, X=X.filter(regex='hippo'),
#                X=pd.concat([X[cluster].T.mean() for cluster in clusters], axis=1),
#                y=session.events.iloc[:, -1]).accuracy

In [None]:
from collections import defaultdict

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import spearmanr
# scipy.spatial.distance.pdist
from scipy.cluster import hierarchy
from scipy.cluster.hierarchy import linkage

from scipy.spatial.distance import squareform

from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
from sklearn.metrics import adjusted_rand_score
from sklearn.model_selection import train_test_split

# X, y = cdict2.signal_matrix, session.events.iloc[:, -1].values

def get_optimal_features(X):
    corr = spearmanr(X).correlation
    # Ensure the correlation matrix is symmetric
    corr = (corr + corr.T) / 2
    np.fill_diagonal(corr, 1)

    # We convert the correlation matrix to a distance matrix before performing
    # hierarchical clustering using Ward's linkage.
    distance_matrix = 1 - np.abs(corr)
    dist_linkage = hierarchy.ward(squareform(distance_matrix))

    cluster_ids = hierarchy.fcluster(dist_linkage, 1, criterion="distance")
    return list(enumerate(cluster_ids))
#     cluster_id_to_feature_ids = list#defaultdict(list)

#     for idx, cluster_id in enumerate(cluster_ids):
#         cluster_id_to_feature_ids[cluster_id].append(idx)
#     selected_features = [v[0] for v in cluster_id_to_feature_ids.values()]
#     slected_ids = [v[1] for v in cluster_id_to_feature_ids.values()]

    return selected_features

# spheres_optimal_features = get_optimal_features(spheres_cdict.signal_matrix)
maps_optimal_features = get_optimal_features(session.computed_.signal_matrix)
maps_optimal_features
# X.iloc[:, maps_optimal_features]

In [None]:
display(*session.computed_.weights)

In [None]:
from cimaq_decoding_utils import flatten
flatten([[{1, 'a'}, 'b', {'c':3}]])

In [None]:
# c3 = (session.events[col]

# display(*session.computed_.weighted_matrices)
# best = session.computed_.signal_matrix.iloc[:, maps_optimal_features]

# best
# dist_mats = Bunch(**dict(tuple(col, Bunch(**dict(tuple(
#     (cond, pairwise_distances(best.set_axis(session.events[col], axis=0).loc[cond].T))
#                        for cond in list(set(col))))))
#                    for col in trial_type_cols))

ctl_idx = 'Ctl'

combo_mat = pd.concat([mat for mat in session.computed_.weighted_matrices[1:]])
cond_signals = Bunch(**dict(tuple((cond, combo_mat.loc[cond])
                                  for cond in combo_mat.index.unique())))
cond_imgs = cond_signals.copy()
[cond_imgs.update({item[0]: nimage.mean_img(maps_masker.inverse_transform(item[1]))})
 for item in tqdm_(tuple(cond_imgs.items()),
                   desc='Converting trial type signals back to 4D')]

# cond_imgs = dict(tuple((cond, nimage.mean_img(maps_masker.inverse_transform(combo_mat.loc[cond])))
#                        for cond in tqdm_(combo_mat.index.unique(),
#                                          desc='Converting trial type signals back to 4D')))
# subtracted_imgs = 
# cond_imgs = flatten([[{cond, maps_masker.inverse_transform(session.computed_.weighted_matrices[mat[0]].loc[cond])}
#                       for cond in session.computed_.weighted_matrices[mat[0]].index.unique()]
#                      for mat in tqdm_(enumerate(session.computed_.weighted_matrices[1:], 1),
#                                       desc='Converting trial type signals back to 4D')])

In [None]:
subtracted_signals = dict(tuple((key, cond_signals[key].mean() - cond_signals[ctl_idx].mean())
                                for key in tuple(key for key in cond_signals
                                                 if key != ctl_idx)))
subtracted_imgs = subtracted_signals.copy()
[subtracted_imgs.update({item[0]: nimage.mean_img(maps_masker.inverse_transform(item[1]))})
 for item in tqdm_(tuple(subtracted_imgs.items()),
                   desc='Converting subtracted signals back to 4D')]

In [None]:
cond_signals['Enc'].mean()# - cond_si

In [None]:
display(*[niplot.view_img(nimage.mean_img(item[1]), dim=-1, opacity=0.8,
                          title=item[0], bg_img=session.anat_img)
          for item in subtracted_imgs.items()])

In [None]:
display(*[niplot.view_img(nimage.mean_img(item[1]), dim=-1, opacity=0.8,
                          title=item[0], bg_img=session.anat_img)
          for item in cond_imgs.items()])

In [None]:
# sns.heatmap(list(dist_mats.values())[0])
sns.heatmap(pd.DataFrame(((pd.DataFrame(val).mean() for val in
                           list(dist_mats.values()))),
                         index=list(dist_mats.keys())).T)

In [None]:
# ndims = 1024
# difumo_cut_coords = pd.read_csv(f'/data/simexp/fnadeau/difumo_{ndims}_dims_3mm_cut_coords.tsv',
#                                 sep='\t', index_col='component')
# # difumo_cut_coords
# most_pred = '|'.join(['retrocalcarine cortex rh', 'precentral sulcus mid-inferior rh',
#                       'paracingulate gyrus posterior', 'inferior temporal sulcus anterior rh',
#                       'superior frontal sulcus mid-posterior lh', 'calcarine cortex mid-posterior rh',
#                       'middle frontal gyrus superior rh', 'superior parietal lobule anterior lh',
#                       'hippocampal fissure', 'cingulate anterior', 'subcentral gyrus rh',
#                       'frontal pole lateral rh', 'postcentral sulcus middle lh',
#                       'lateral fissure posterior limb lh', 'posterior cingulate cortex anterior',
#                       'middle frontal gyrus mid-posterior inferior rh',
#                       'intracalcarine cortex', 'fusiform gyrus anterior rh'])

# maps_path = '/data/simexp/fnadeau/nilearn_atlases/difumo_atlases/1024/3mm/maps.nii.gz'
# most_pred_idx = difumo_cut_coords.reset_index().set_index(
#                     'difumo_names').loc[most_pred.split('|')].component
# feature_names = difumo_cut_coords.loc[most_pred_idx].difumo_names
# maps_indexes = [idx -1 for idx in most_pred_idx]

# new_difumo = nimage.index_img(nimage.load_img(maps_path), most_pred_idx)


In [None]:
display(*[niplot.plot_prob_atlas(nimage.index_img(di1024.maps, [0])),
          niplot.view_img(nimage.mean_img(nimage.index_img(di1024.maps, [0])))])

In [None]:


# di2 = difumo_cut_coords.copy(deep=True).set_index('difumo_names').T

# wm =  di2.filter(wm_rois).columns.tolist()
# not_cortex = di2.filter(regex='|'.join([foremidbrain_rois, wm_rois])).columns.tolist()
# di2 = di2.drop(not_cortex, axis=1)

# cortex_rois = difumo_cut_coords.reset_index(drop=False).set_index(
#                   'difumo_names', drop=False).loc[di2.columns].component.values



# maps_masker = NiftiMapsMasker(maps_img=new_difumo,
#                               mask_img=session.mask_img,
#                               resampling_target='mask',
#                               t_r=get_t_r(session.fmri_img),
#                               **masker_defs).fit()

# from nilearn import plotting as niplot

# new_seeds = niplot.find_probabilistic_atlas_cut_coords(maps_masker.maps_img_)

# spheres_masker = NiftiSpheresMasker(new_seeds,
#                                     mask_img=session.mask_img,
#                                     t_r=get_t_r(session.fmri_img),
#                                     **masker_defs).fit()


In [None]:
# NiftiMapsMasker Vs NiftiSpheresMasker Resampling Fit Test



# spheres_cdict = get_all_contrasts(fmri_img=session.fmri_img,
#                                   events=session.events,
#                                   masker=spheres_masker,
#                                   output_type='effect_size',
#                                   trial_type_cols=trial_type_cols,
#                                   glm_kws=session.glm_defs,
#                                   design_kws=session.design_defs,
#                                   feature_labels=di2.columns.tolist())

# session.computed_ = get_all_contrasts(fmri_img=session.fmri_img,
#                                events=session.events,
#                                masker=maps_masker,
#                                output_type='effect_size',
#                                trial_type_cols=trial_type_cols,
#                                glm_kws=session.glm_defs,
#                                design_kws=session.design_defs,
#                                feature_labels=di2.columns.tolist())

In [None]:
feature_names.iloc[maps_optimal_features]

In [None]:
opt_seeds = niplot.find_probabilistic_atlas_cut_coords(
                nimage.index_img(new_difumo, spheres_optimal_features))
opt_maps = nimage.index_img(new_difumo, maps_optimal_features)

opt_spheres_masker = NiftiSpheresMasker(opt_seeds,
                                        mask_img=session.mask_img,
                                        t_r=get_t_r(session.fmri_img),
                                        **masker_defs).fit()

opt_maps_masker = NiftiMapsMasker(maps_img=opt_maps,
                                  mask_img=session.mask_img,
                                  resampling_target='mask',
                                  t_r=get_t_r(session.fmri_img),
                                  **masker_defs).fit()

opt_spheres_cdict = get_all_contrasts(fmri_img=session.fmri_img,
                                  events=session.events,
                                  masker=spheres_masker,
                                  output_type='effect_size',
                                  trial_type_cols=trial_type_cols,
                                  glm_kws=session.glm_defs,
                                  design_kws=session.design_defs,
                                  feature_labels=di2.columns.tolist())

opt_session.computed_ = get_all_contrasts(fmri_img=session.fmri_img,
                               events=session.events,
                               masker=maps_masker,
                               output_type='effect_size',
                               trial_type_cols=trial_type_cols,
                               glm_kws=session.glm_defs,
                               design_kws=session.design_defs,
                               feature_labels=di2.columns.tolist())


In [None]:
opt_ovrc = OneVsRestClassifier(SVC(tol=0.0001, kernel='linear',
                                   cache_size=5,
                                   decision_function_shape='ovr',
                                   class_weight='balanced',
                                   probability=True))

opt_spheres_acc = validate_model(opt_ovrc, X=opt_spheres_cdict.signal_matrix,
                                 y=session.events.iloc[:, -1],
                                 test_size=0.5,
                                 stratify=session.events.iloc[:, -1])

opt_maps_acc = validate_model(opt_ovrc, X=opt_session.computed_.signal_matrix,
                                  y=session.events.iloc[:, -1],
                                  test_size=0.5,
                                  stratify=session.events.iloc[:, -1])

display(opt_spheres_acc, opt_maps_acc)

In [None]:
len(spheres_optimal_features), len(maps_optimal_features)

common_opt_features = list(set(spheres_optimal_features).intersection(set(maps_optimal_features)))

common_seeds = niplot.find_probabilistic_atlas_cut_coords(
                nimage.index_img(new_difumo, common_opt_features))
common_maps = nimage.index_img(new_difumo, common_opt_features)

print(len(common_opt_features), common_seeds.shape, common_maps.shape)

common_spheres_masker = NiftiSpheresMasker(common_seeds,
                                           mask_img=session.mask_img,
                                           t_r=get_t_r(session.fmri_img),
                                           **masker_defs).fit()

common_maps_masker = NiftiMapsMasker(maps_img=common_maps,
                                     mask_img=session.mask_img,
                                     resampling_target='mask',
                                     t_r=get_t_r(session.fmri_img),
                                     **masker_defs).fit()



In [None]:
common_roi_labels = di2.iloc[:, common_opt_features].columns.tolist()

common_spheres_cdict = get_all_contrasts(fmri_img=session.fmri_img,
                                  events=session.events,
                                  masker=common_spheres_masker,
                                  output_type='effect_size',
                                  trial_type_cols=trial_type_cols,
                                  glm_kws=session.glm_defs,
                                  design_kws=session.design_defs,
                                  feature_labels=common_roi_labels)


common_session.computed_ = get_all_contrasts(fmri_img=session.fmri_img,
                               events=session.events,
                               masker=common_maps_masker,
                               output_type='effect_size',
                               trial_type_cols=trial_type_cols,
                               glm_kws=session.glm_defs,
                               design_kws=session.design_defs,
                               feature_labels=common_roi_labels)


In [None]:


common_ovrc = OneVsRestClassifier(SVC(tol=0.0001, kernel='linear',
                                   cache_size=5,
                                   decision_function_shape='ovr',
                                   class_weight='balanced',
                                   probability=True))

common_spheres_acc = validate_model(common_ovrc, X=common_spheres_cdict.signal_matrix,
                                 y=session.events.iloc[:, -1],
                                 test_size=0.5,
                                 stratify=session.events.iloc[:, -1])

common_maps_acc = validate_model(common_ovrc, X=common_session.computed_.signal_matrix,
                                  y=session.events.iloc[:, -1],
                                  test_size=0.5,
                                  stratify=session.events.iloc[:, -1])

display(common_spheres_acc, common_maps_acc)

In [None]:
from sklearn.preprocessing import MaxAbsScaler

common_maps_inv = common_maps_masker.inverse_transform(MinMaxScaler().fit_transform(common_session.computed_.signal_matrix))
# common_spheres_inv = common_spheres_masker.inverse_transform(common_spheres_cdict.signal_matrix)

In [None]:
treshold = nimage.mean_img(common_maps_inv).get_fdata().min()
treshold

In [None]:
niplot.view_img(nimage.mean_img(common_maps_inv), title='18 Most Predictive ROIs',
                display_mode='mosaic', opacity=0.8,
#                 treshold=treshold,
                symmetric_cmap=False,
                bg_img=nimage.load_img(sessions[0].anat_path))

In [None]:
# niplot.view_img()
niplot.plot_prob_atlas(common_maps_inv, title='18 Most Predictive ROIs',
                       display_mode='mosaic',
#                        opacity=0.8,
                       bg_img=nimage.load_img(sessions[0].anat_path))


    

In [None]:
common_roi_labels

In [None]:
display(*[niplot.view_img(inverse_imgs.ctl_miss_ws_cs[trial_type],
                                    title=trial_type, dim=-1,
                                    symmetric_cmap=False, opacity=0.8,
                                    cut_coords=(42, 19, -8),
                                    bg_img=nimage.load_img(sessions[0].anat_path))
          for trial_type in inverse_imgs.ctl_miss_ws_cs])

In [None]:
common_spheres_optimal_features = get_optimal_features(common_spheres_cdict.signal_matrix)
common_maps_optimal_features = get_optimal_features(common_session.computed_.signal_matrix)

n_opt_sphres = len(common_spheres_optimal_features)
n_opt_maps = len(common_maps_optimal_features)

opt_common_features = set(common_spheres_optimal_features).intersection(
                          set(common_maps_optimal_features))
n_opt_common = len(opt_common_features)

display(n_opt_sphres, n_opt_maps, n_opt_common)

In [None]:
common_maps_optimal_features

In [None]:
#     clf = RandomForestClassifier(n_estimators=X_train.shape[0], random_state=42)
#     clf.fit(X_train, y_train)

#     result = permutation_importance(clf, X_train, y_train, n_repeats=2, random_state=42)
#     perm_sorted_idx = np.argsort(result.importances_mean)

#     tree_importance_sorted_idx = np.argsort(clf.feature_importances_)
#     tree_indices = np.arange(0, len(clf.feature_importances_)) + 0.5

#     # print("Accuracy on test data: {:.2f}".format(clf.score(X_test, y_test)))
#     feature_importance_sorted = clf.feature_importances_[tree_importance_sorted_idx]
#     X_train, X_test, y_train, y_test = train_test_split(X, y,
#                                                         test_size=,
#                                                         random_state=random_state)

In [None]:
spheres_cdict.signal_matrix.isna().any().all()

In [None]:
from scipy.stats import kendalltau, pearsonr, spearmanr

X = spheres_cdict.signal_matrix

corr = spearmanr(X).correlation
corr = (corr + corr.T) / 2
np.fill_diagonal(corr, 1)
np.nan_to_num(corr, 0)
# corr = corr.fillna(0)
distance_matrix = 1 - np.abs(corr)

dist_linkage = hierarchy.ward(squareform(distance_matrix))
# help(permutation_importance)

permutation_importance(estimator=ovrc, X=X, y=session.events.iloc[:,-1],
                       scoring='accuracy')
# pd_corr = pd.DataFrame(X.values, index=None).corr(method='spearman')

# methods = {'kendall': kendalltau, 'pearson': pearsonr, 'spearman': spearmanr}
# method = 'spearman'

# (X.corr('spearman') + X.corr('spearman').T) /2
# == pd.DataFrame(X.values, index=None).corrwith(
#     X.values.T ,method='spearman')
# corr = spearmanr(X).correlation
# # corr.shape
# 
# corr.shape
# 
# 
# # help(X.corr)
# help(scipy.stats.spearmanr)
# corr.shape
# 
# distance_matrix.shape
# 
# cluster_ids = hierarchy.fcluster(dist_linkage, 1, criterion="distance")

# display(corr.shape, distance_matrix.shape, dist_linkage.shape, cluster_ids.shape)

In [None]:
spheres_difumo = 
spheres_optimal_features

In [None]:
len([idx for idx in spheres_optimal_features if idx not in maps_optimal_features])
from nilearn import plotting as niplot

In [None]:
end_img = maps_masker.inverse_transform(cdict2.signal_matrix)

In [None]:
display(niplot.view_img(nimage.mean_img(nimage.index_img(end_img,
                                                         maps_optimal_features)),
                        title='Maps Masker ROIs',
                        dim=-1, symmetric_cmap=False, opacity=0.8,
                        bg_img=nimage.load_img(sessions[0].anat_path)))

In [None]:
best_difumo = nimage.index_img(new_difumo, maps_optimal_features)

best_maps_masker = NiftiMapsMasker(maps_img=best_difumo,
#                               mask_img=sessions[0].mask_img,
                              resampling_target='maps',
#                               t_r=2.5,
                              standardize_confounds=False,
                              smoothing_fwhm=None,
                              high_variance_confounds=False,
                              detrend=False, dtype='f',
                              low_pass=None, high_pass=None,
                              allow_overlap=True,
                              standardize=False).fit()



In [None]:
best_img = best_maps_masker.inverse_transform(session.computed_.signal_matrix.iloc[:, maps_optimal_features])

In [None]:
display(niplot.view_img(nimage.mean_img(best_img),
                        title='Spheres Masker ROIs',
                        dim=-1, symmetric_cmap=False, opacity=0.8,
                        treshold='auto',
                        bg_img=nimage.load_img(sessions[0].anat_path)))

In [None]:
X = cdict2.signal_matrix.set_index(sessions[0].events.ctl_miss_ws_cs).reset_index()
ctl_idx = X[X.ctl_miss_ws_cs=='Ctl'].index
enc_idx = X[X.ctl_miss_ws_cs=='Enc'].index

end_ctl = nimage.index_img(end_img, ctl_idx)
end_enc = nimage.index_img(end_img, enc_idx)

In [None]:

nilearn.plotting.view_img(nimage.mean_img(end_img), dim=-1,
                           symmetric_cmap=False,
                           opacity=0.8,
                           bg_img=nimage.load_img(sessions[0].anat_path))
#  for img in (end_ctl, end_enc)]
# select_end_img = nimage.index_img(end_img, optimal_features)
## resampling_target = 'data'
# end_img.shape = (104, 123, 104, 117)

In [None]:
# d1024 = Path('/data/simexp/fnadeau/nilearn_atlases/difumo_atlases/1024/labels_1024_dictionary.csv')
# from io import StringIO

# difumo1064_labels = pd.read_csv(StringIO(d1024.read_bytes().lower().decode()),
#                                 index_col='component')
# difumo1064_labels.to_csv(d1024)
#['difumo_names'].value_counts()>1).replace({False:np.nan}).dropna()
# help(NiftiMapsMasker)

In [None]:
trial_type_cols = ['trial_type', 'recognition_performance', 'position_performance']
trial_types = get_glm_events(sessions[0].events, trial_type_cols=trial_type_cols)
display(*designs)

In [None]:
# sorted(Path('/data/simexp/fnadeau/wmstim/').iterdir())


In [None]:
trial_type_cols=['trial_type', 'recognition_performance',
                 'ctl_miss_ws_cs']

cdict2['contrasts'] = [cdict2[key].contrast_img for key in
                       tuple(cdict2.keys())[:1+len(trial_type_cols)]]


In [None]:
inverse_imgs = Bunch(**dict(tuple((key, Bunch(**dict(tuple((idx, maps_masker.inverse_transform(
                   cdict2[key].signals.loc[idx])) for idx in tqdm_(cdict2[key].signals.index)))))
                                  for key in tuple(cdict2.keys())[:1+len(trial_type_cols)])))


In [None]:
inverse_imgs.keys()

In [None]:
nilearn.plotting.view_img(inverse_imgs.trial_type.Enc,
                          dim=-1,
                          symmetric_cmap=False,
                          opacity=0.8,
                          bg_img=nimage.load_img(sessions[0].anat_path))

In [None]:
nilearn.plotting.view_img(inverse_imgs.trial_type.Ctl,
                          dim=-1, symmetric_cmap=False,
                          opacity=0.8, cut_coords=(-10, -30, 5),
                          bg_img=nimage.load_img(sessions[0].anat_path))

In [None]:
display(*[nilearn.plotting.view_img(inverse_imgs.ctl_miss_ws_cs[trial_type],
                                    title=trial_type, dim=-1,
                                    symmetric_cmap=False, opacity=0.8,
                                    cut_coords=(42, 19, -8),
                                    bg_img=nimage.load_img(sessions[0].anat_path))
          for trial_type in inverse_imgs.ctl_miss_ws_cs])

In [None]:
# [cdict2 for key in tuple(cdict2.keys())[1:]]
cdict2.keys()

In [None]:
all_ctl_signals = [cdict2[key].signals.loc['Ctl'] for key in tuple(cdict2.keys())[1: 1+len(trial_type_cols)]]
all_ctl_imgs = [maps_masker.inverse_transform(sigs) for sigs in all_ctl_signals]

In [None]:
display(*[nilearn.plotting.view_img(ctl_img,
                                    title='Control Trials', dim=-1,
                                    symmetric_cmap=False, opacity=0.8,
                                    cut_coords=(42, 19, -8),
                                    bg_img=nimage.load_img(sessions[0].anat_path))
          for ctl_img in all_ctl_imgs])

In [None]:
results_signals = cdict2.signal_matrix.set_index(sessions[0].events.ctl_miss_ws_cs)
results_conditions = results_signals.index.unique()
results_imgs = [maps_masker.inverse_transform(results_signals.loc[cond])
                for cond in results_conditions]


In [None]:
display(*[nilearn.plotting.view_img(nimage.mean_img(results_imgs[cond[0]]),
                                    title=cond[1], dim=-1,
                                    symmetric_cmap=False, opacity=0.8,
                                    cut_coords=(-38, 16, -25),
                                    bg_img=nimage.load_img(sessions[0].anat_path))
          for cond in enumerate(results_conditions)])

In [None]:
select_signals = cdict2.signal_matrix.iloc[:, selected_features]
select_names = di2[select_signals.columns].columns
select_idx = difumo_cut_coords.reset_index(drop=False).set_index(
                 'difumo_names').T[select_names.tolist()].loc['component'].values.astype(int)
select_idx
# maps_path = '/data/simexp/fnadeau/nilearn_atlases/difumo_atlases/1024/3mm/maps.nii.gz'
# select_difumo = nimage.index_img(nimage.load_img(maps_path), select_idx)

# select_masker = NiftiMapsMasker(maps_img=select_difumo,
#                               mask_img=sessions[0].mask_img,
#                               resampling_target='mask',
#                               t_r=2.5,
#                               standardize_confounds=False,
#                               smoothing_fwhm=None,
#                               high_variance_confounds=False,
#                               detrend=False, dtype='f',
#                               low_pass=None, high_pass=None,
#                               allow_overlap=True,
#                               standardize=False).fit()


In [None]:
# setattr(maps_masker, 'sample_mask', select_idx)
select_img = nimage.index_img(maps_masker.inverse_transform(cdict2.signal_matrix.values.T),
                              select_idx)


In [None]:
nilearn.plotting.view_img(nimage.mean_img(select_img),
                          title='Mean Most Significant Trials', dim=-1,
                          symmetric_cmap=False, opacity=0.8,
#                             cut_coords=(42, 19, -8),
                            bg_img=nimage.load_img(sessions[0].anat_path))

In [None]:
distance_img = select_masker.inverse_transform(pwdist_signals)

In [None]:
nilearn.plotting.view_img(nimage.mean_img(results_imgs[cond[0]]),
                                    title=cond[1], dim=-1,
                                    symmetric_cmap=False, opacity=0.8,
                                    cut_coords=(-38, 16, -25),
                                    bg_img=nimage.load_img(sessions[0].anat_path))

In [None]:
from sklearn.preprocessing import MaxAbsScaler, MinMaxScaler, StandardScaler
from sklearn.pipeline import Pipeline

pipe_components = pd.DataFrame(((True, 'standardize', StandardScaler()),
                                 (True, 'maximize', MaxAbsScaler()),
                                (True, 'scale', MinMaxScaler()))).set_index(0)
pipe_components = [tuple(item) for item in
                   pipe_components.loc[pipe_components.index==True].values.tolist()]
pipeline = Pipeline(pipe_components)
pipeline

In [None]:
src = '/data/simexp/fnadeau/best_features/'

roi_tables = sorted(Path(src).rglob('*task-memory_bestfeatures.tsv'))
test = pd.concat([pd.Series(pd.read_csv(apath, sep='\t',
                                        index_col='trial_type').columns)
                  for apath in roi_tables])
test.value_counts(ascending=False).head(15)

In [None]:
import numpy as np
from neurora.stuff import limtozero
import math
import sys
from scipy.stats import pearsonr
from neurora.stuff import show_progressbar
from neurora.decoding import tbyt_decoding_kfold
from nilearn.masking import apply_mask


def fmriRDM_roi(fmri_data, mask_data,
                sub_opt: bool = True,
                method: str = "correlation",
                abs: bool = False):

    """
    Calculate the ROI-based fMRI Representational Dissimilarity Matrix - RDM(s).

    Args:
        fmri_data : array
            The fMRI data. The shape of ``fmri_data`` must be
            [n_cons, n_subs, nx, ny, nz], respectively representing the
            number of conditions, subjects & the size of fMRI-img.

        mask_data : array [nx, ny, nz].
            The mask data for region of interest (ROI).
            The size of the fMRI image (nx, ny, nz) represent the number
            of voxels along the x, y, z axis.

        sub_opt: bool (Default = True).
            If sub_opt is True, return the results of each subject.
            Otherwise, return the average result.

        method : str {'correlation' | 'euclidean'} (Default = 'correlation').
            The method to calculate the dissimilarities, where 'correlation'
            means Pearson Correlation and 'euclidean' means Euclidean Distance.
            The results will be normalized.

        abs : bool (Default = True).
            Calculate the absolute value of Pearson r or not.

    Returns:
        RDM : array
            The fMRI-ROI RDM.
            If sub_opt is True, the shape of RDM is [n_subs, n_cons, n_cons].
            Otherwise, the shape of RDM is [n_cons, n_cons].

    Notes:
        The sizes (nx, ny, nz) of fmri_data and mask_data should be same.
    """

    input_shape_err = " ".join(["\nThe shape of inputs (fmri_data & mask_data)",
                                  "for fmriRDM_roi() function should be:\n",
                                  "[n_cons, n_subs, nx, ny, nz] &\n",
                                  "[nx, ny, nz], respectively.\n"])

    if len(np.shape(fmri_data)) != 5 or len(np.shape(mask_data)) != 3:
        sys.exit(input_shape_err)

    # get the number of conditions, subjects, the size of the fMRI-img
    ncons, nsubs, nx, ny, nz = fmri_data.shape
    
    #Masking
    

    # initialize the RDMs
    subrdms = np.zeros([nsubs, ncons, ncons], dtype=np.float)

    # shape of data: [ncons, nsubs, n] -> [nsubs, ncons, n]
    data = np.transpose(data, (1, 0, 2))

    # calculate the values in RDM
    for sub in range(nsubs):
        f
        for i in range(ncons):
            for j in range(ncons):

                if np.isnan(data).any():
                    if method == 'correlation':
                        # calculate the Pearson Coefficient
                        r = pearsonr(data[sub, i], data[sub, j])[0]
                        # calculate the dissimilarity
                        if abs == True:
                            subrdms[sub, i, j] = limtozero(1 - np.abs(r))
                        else:
                            subrdms[sub, i, j] = limtozero(1 - r)
                    elif method == 'euclidean':
                        subrdms[sub, i, j] = np.linalg.norm(data[sub, i] - data[sub, j])
                    """elif method == 'mahalanobis':
                        X = np.transpose(np.vstack((data[sub, i], data[sub, j])), (1, 0))
                        X = np.dot(X, np.linalg.inv(np.cov(X, rowvar=False)))
                        subrdms[sub, i, j] = np.linalg.norm(X[:, 0] - X[:, 1])"""
        if method == 'euclidean':
            data_max, data_min = np.max(subrdms[sub]), np.min(subrdms[sub])
            subrdms[sub] = (subrdms[sub] - data_min) / (data_max - data_min)

    # average the RDMs
    rdm = np.average(subrdms, axis=0)

    if sub_opt is True:
        print("RDMs computing finished!")
        return subrdms

    print("RDM computing finished!")
    return rdm

In [None]:
from collections import defaultdict

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import spearmanr
from scipy.cluster import hierarchy
from scipy.spatial.distance import squareform

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
from sklearn.model_selection import train_test_split

dst = '/data/simexp/fnadeau/best_features/'
for session in tqdm_(sessions):
    cdict2 = get_all_contrasts(fmri_img=session.fmri_img,
                               events=session.events,
                               masker=spheres_masker,
                               output_type='effect_size',
                               trial_type_cols=['trial_type',
                                                'recognition_performance',
                                                'ctl_miss_ws_cs'],
                               glm_kws=session.glm_defs,
                               design_kws=session.design_defs,
                               feature_labels=di2.columns.tolist())

    X, y = cdict2.signal_matrix, session.events.iloc[:, -1].values
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5,
                                                        random_state=42)

    clf = RandomForestClassifier(n_estimators=X_train.shape[0], random_state=42)
    clf.fit(X_train, y_train)

    result = permutation_importance(clf, X_train, y_train, n_repeats=2, random_state=42)
    perm_sorted_idx = np.argsort(result.importances_mean)

    tree_importance_sorted_idx = np.argsort(clf.feature_importances_)
    tree_indices = np.arange(0, len(clf.feature_importances_)) + 0.5

    # print("Accuracy on test data: {:.2f}".format(clf.score(X_test, y_test)))
    feature_importance_sorted = clf.feature_importances_[tree_importance_sorted_idx]

    corr = spearmanr(X).correlation

    # Ensure the correlation matrix is symmetric
    corr = (corr + corr.T) / 2
    np.fill_diagonal(corr, 1)

    # We convert the correlation matrix to a distance matrix before performing
    # hierarchical clustering using Ward's linkage.
    distance_matrix = 1 - np.abs(corr)
    dist_linkage = hierarchy.ward(squareform(distance_matrix))

    cluster_ids = hierarchy.fcluster(dist_linkage, 1, criterion="distance")
    cluster_id_to_feature_ids = defaultdict(list)

    for idx, cluster_id in enumerate(cluster_ids):
        cluster_id_to_feature_ids[cluster_id].append(idx)
    selected_features = [v[0] for v in cluster_id_to_feature_ids.values()]

    X_train_sel = X_train.iloc[:, selected_features]
    X_test_sel = X_test.iloc[:, selected_features]

    clf_sel = RandomForestClassifier(n_estimators=X_train_sel.shape[0],
                                     random_state=42)
    clf_sel.fit(X_train_sel, y_train)
    end = 'task-memory_bestfeatures'
    top_features = cdict2.signal_matrix.iloc[:, selected_features]

    top_features.to_csv(os.path.join(dst,f'{session.sub_id}_{session.ses_id}_{end}.tsv'),
                        sep='\t', index='trial_type', encoding='UTF-8-SIG')

    goodfeatures = top_features#.iloc[:, selected_features]

    selected_distances = pd.DataFrame(scipy.stats.spearmanr(pairwise_distances(goodfeatures.T.values))[0],
                                      index=goodfeatures.columns,
                                      columns=goodfeatures.columns)

    masked_distances = selected_distances.where(selected_distances.values !=
                                                np.tril(selected_distances.values))
    masked_distances = masked_distances.where(masked_distances.values!=0)

    masked_distances = masked_distances.where(~np.array([row[1].fillna(0).between(-0.50, 0.50).values
                                                         for row in masked_distances.iterrows()]))
    masked_distances = masked_distances.dropna(axis=1, how='all').dropna(axis=0, how='all')

    the_heatmap = sns.heatmap(masked_distances).get_figure().savefig(os.path.join(dst,
        f'{session.sub_id}_{session.ses_id}_{end}_heatmap.jpg'))
    
    trial_type_labels = [session.events.trial_type, session.events.recognition_performance,
                         session.events.ctl_miss_ws_cs]
    
    svc = SVC(tol=0.0001, kernel='linear', cache_size=5,
          decision_function_shape='ovr', class_weight='balanced',
          probability=True)

    ovrc = OneVsRestClassifier(svc)
    suffix = 'classification-report.tsv'
    try:
        cr_results = [validate_model(ovrc, top_features, y=ttlabels)
                      for ttlabels in trial_type_labels]

        [cr_results[col[0]].to_csv(os.path.join(dst, f'{session.sub_id}_{session.ses_id}_{col[1].name}-'+suffix),
                        sep='\t', index=True, encoding='UTF-8-SIG')
         for col in enumerate(trial_type_labels)]
    except ValueError:
        continue


In [None]:
# def weightings(signals: pd.DataFrame,
#                weights: pd.DataFrame,
#                ) -> pd.DataFrame:
# #                weight_labels: Union[list, pd.Index] = None
#     """
#     Return condition-wise weighted signals DataFrame.
#     """

#     newsignals = signals.copy(deep=True)#.set_axis(condition_labels, axis=0)
# #     if weight_labels is not None:
# #         weights = weights.set_axis(weight_labels, axis=0)
#     for cond in weights.index.unique():
#         newsignals.loc[cond] = newsignals.loc[cond]*weights.loc[cond]
#     return newsignals


# def get_weighted_signals(signals: pd.DataFrame,
#                          weights: list,
#                          standardize: bool = True,
#                          **kwargs
#                          ) -> pd.DataFrame:
#     """
#     Return condition-wise weighted signals DataFrame for all conditions.
    
#     Args:
#         signals
#     """

#     orig_index, orig_cols = signals.index, signals.columns
# #     if weight_labels is None:
# #         weight_labels = [[]] * len(weights)
# # weight_labels=weight_labels[weight[0]]
#     w0 = [weightings(signals, weights=weights[weight[0]])
#           for weight in enumerate(weights)]
#     slist = [signals]+w0
#     data = np.prod(np.array(slist), axis=0)
#     if standardize is True:
#         data = StandardScaler().fit_transform(data)
#     return pd.DataFrame(data, index=orig_index, columns=orig_cols)

# weighted_signals = [weightings(item[0], item[1]) for item in
#                     tuple(zip(cdict2.matrices[1:], cdict2.weights))]


In [None]:
# weighted_signals = [weightings(item[0], item[1]) for item in
#                     tuple(zip(cdict2.matrices[1:], cdict2.weights))]

# cond_weights = pd.DataFrame(StandardScaler().fit_transform(np.prod(np.array(weighted_signals), axis=0)),
#                             index=cdict2.matrices[0].index, columns=cdict2.matrices[0].columns)
                            

In [None]:
# all(result.importances_mean.argsort()==np.argsort(result.importances_mean))
# all(clf.feature_importances_.argsort()==np.argsort(clf.feature_importances_))
# sorted(result.keys())
# np.argsort(result.importances)==np.argsort(clf.feature_importances_)

In [None]:
# display(tree_importance_sorted_idx, tree_importance_sorted_idx.shape,
#         tree_indices, tree_indices.shape)

In [None]:

# display(sorted(clf.__dict__.keys()),
#         result, perm_sorted_idx, feature_importance_sorted,
#         feature_importance_sorted.shape)
# display(np.argsort(clf.feature_importances_))
# feature_importance_sorted

In [None]:
# result.importances[perm_sorted_idx].T.shape
# help(np.argsort)
# perm_sorted_idx
# X.iloc[:, perm_sorted_idx]

In [None]:
# fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(512, 256))
# ax1.barh(tree_indices, feature_importance_sorted, height=0.7)
# ax1.set_yticks(tree_indices)
# ax1.set_yticklabels(X.iloc[:, tree_importance_sorted_idx].columns)
# ax1.set_ylim((0, len(clf.feature_importances_)))
# ax2.boxplot(result.importances[perm_sorted_idx].T,
#             vert=False,
#             labels=X.iloc[:, perm_sorted_idx].columns)
# fig.tight_layout()
# plt.show()

In [None]:
# fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(512, 256))
# corr = spearmanr(X).correlation

# # Ensure the correlation matrix is symmetric
# corr = (corr + corr.T) / 2
# np.fill_diagonal(corr, 1)

# # We convert the correlation matrix to a distance matrix before performing
# # hierarchical clustering using Ward's linkage.
# distance_matrix = 1 - np.abs(corr)
# dist_linkage = hierarchy.ward(squareform(distance_matrix))

# dendro = hierarchy.dendrogram(dist_linkage, labels=X.columns.tolist(),
#                               ax=ax1, leaf_rotation=90)
# dendro_idx = np.arange(0, len(dendro["ivl"]))

# ax2.imshow(corr[dendro["leaves"], :][:, dendro["leaves"]])
# ax2.set_xticks(dendro_idx)
# ax2.set_yticks(dendro_idx)
# ax2.set_xticklabels(dendro["ivl"], rotation="vertical")
# ax2.set_yticklabels(dendro["ivl"])
# fig.tight_layout()
# plt.show()
# plt.savefig('/data/simexp/fnadeau/hierarchical_clustering_dendrogram.jpg')

In [None]:
corr = spearmanr(X).correlation

# Ensure the correlation matrix is symmetric
corr = (corr + corr.T) / 2
np.fill_diagonal(corr, 1)

# We convert the correlation matrix to a distance matrix before performing
# hierarchical clustering using Ward's linkage.
distance_matrix = 1 - np.abs(corr)
dist_linkage = hierarchy.ward(squareform(distance_matrix))

# dendro = hierarchy.dendrogram(dist_linkage, labels=X.columns.tolist(),
#                               ax=ax1, leaf_rotation=90)
# dendro_idx = np.arange(0, len(dendro["ivl"]))

cluster_ids = hierarchy.fcluster(dist_linkage, 1, criterion="distance")
cluster_id_to_feature_ids = defaultdict(list)

for idx, cluster_id in enumerate(cluster_ids):
    cluster_id_to_feature_ids[cluster_id].append(idx)
selected_features = [v[0] for v in cluster_id_to_feature_ids.values()]

X_train_sel = X_train.iloc[:, selected_features]
X_test_sel = X_test.iloc[:, selected_features]

clf_sel = RandomForestClassifier(n_estimators=X_train_sel.shape[0],
                                 random_state=42)
clf_sel.fit(X_train_sel, y_train)
print(
    "Accuracy on test data with features removed: {:.2f}".format(
        clf_sel.score(X_test_sel, y_test)
    )
)

In [None]:
# os.mkdir('/data/simexp/fnadeau/best_features/')

In [None]:
# top_features

In [None]:
top_features = cdict2.signal_matrix.iloc[:, selected_features]

top_features.to_csv(f'/data/simexp/fnadeau/best_features/{sess00.sub_id}_{sess00.ses_id}_task-memory_bestfeatures.tsv',
                    sep='\t', index='trial_type', encoding='UTF-8-SIG')
# top_features.columns.tolist()
# top_features.iloc[:, 0].unique().shape

In [None]:
goodfeatures = top_features#.iloc[:, selected_features]

selected_distances = pd.DataFrame(scipy.stats.spearmanr(pairwise_distances(goodfeatures.T.values))[0],
                                  index=goodfeatures.columns,
                                  columns=goodfeatures.columns)

masked_distances = selected_distances.where(selected_distances.values !=
                                            np.tril(selected_distances.values))
masked_distances = masked_distances.where(masked_distances.values!=0)

masked_distances = masked_distances.where(~np.array([row[1].fillna(0).between(-0.50, 0.50).values for row
                                           in masked_distances.iterrows()]))
masked_distances = masked_distances.dropna(axis=1, how='all').dropna(axis=0, how='all')

the_heatmap = sns.heatmap(masked_distances)
the_heatmap.get_figure().savefig(f'/data/simexp/fnadeau/best_features/{sess00.sub_id}_{sess00.ses_id}_task-memory_bestfeatures_heatmap.jpg') 

In [None]:
masked_distances.min()

In [None]:

# def weightings(signals: pd.DataFrame,
#                weights: pd.DataFrame,
#                condition_labels: Union[list, pd.Index]
#                ) -> pd.DataFrame:
# #                weight_labels: Union[list, pd.Index] = None
#     """
#     Return condition-wise weighted signals DataFrame.
#     """

#     newsignals = signals.set_axis(condition_labels, axis=0)
# #     if weight_labels is not None:
# #         weights = weights.set_axis(weight_labels, axis=0)
#     for cond in weights.index.unique():
#         newsignals.loc[cond] = newsignals.loc[cond]*weights.loc[cond]
#     return newsignals


# def get_weighted_signals(signals: pd.DataFrame,
#                          weights: list,
#                          condition_labels: Union[list, pd.Index],
# #                          weight_labels: Union[list, pd.Index] = None,
#                          standardize: bool = True,
#                          **kwargs
#                          ) -> pd.DataFrame:
#     """
#     Return condition-wise weighted signals DataFrame for all conditions.
    
#     Args:
#         signals
#     """

#     orig_index, orig_cols = signals.index, signals.columns
# #     if weight_labels is None:
# #         weight_labels = [[]] * len(weights)
# # weight_labels=weight_labels[weight[0]]
#     w0 = [weightings(signals, weights=weights[weight[0]],
#                      condition_labels=condition_labels[weight[0]])
#           for weight in enumerate(weights)]
#     slist = [signals]+w0
#     data = np.prod(np.array(slist), axis=0)
#     if standardize is True:
#         data = StandardScaler().fit_transform(data)
#     return pd.DataFrame(data, index=orig_index, columns=orig_cols)


In [None]:
# get_weighted_signals(cdict2.whole.signals, cdict2[key].signals,
#                      cdict2[key].condition_labels)

In [None]:
sess00.events.iloc[:, -1]

In [None]:


# trial_type_labels = [sess00.events.trial_type, sess00.events.recognition_performance,
#                      sess00.events.ctl_miss_ws_cs]
# cr_results = [validate_model(ovrc, top_features,#cdict2.signal_matrix,
#                                   y=ttlabels)
#                     for ttlabels in trial_type_labels]

# [cr_results[col[0]].to_csv(f'/data/simexp/fnadeau/best_features/{sess00.sub_id}_{sess00.ses_id}_{col[1]}-classification-report.tsv',
#                 sep='\t', index=True, encoding='UTF-8-SIG')
#  for col in enumerate(['trial_type', 'recognition_performance', 'ctl_miss_ws_cs'])]
# display(*crossval_results)

In [None]:
svc = SVC(tol=0.0001,
          kernel='linear',
          cache_size=5,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)

ovrc = OneVsRestClassifier(svc)

X, y = cdict2.signal_matrix.astype(np.float16), sess00.events.iloc[:, -1]

validation_params = dict(test_size=0.4, shuffle=True,
                         stratify=y, random_state=None)

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    **validation_params)
ovrc.fit(X_train, y_train)
    
    
result = permutation_importance(ovrc, X_train, y_train, n_repeats=10, random_state=42)
result

In [None]:
from collections import defaultdict

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import spearmanr
from scipy.cluster import hierarchy
from scipy.spatial.distance import squareform

from sklearn.cluster import FeatureAgglomeration
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction import image
from sklearn.inspection import permutation_importance
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier





clf = RandomForestClassifier(n_estimators=100, random_state=42)

X, y = cdict2.signal_matrix, sess00.events.iloc[:, -1]

validation_params = dict(test_size=0.4, shuffle=True,
                         stratify=y, random_state=None)

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    **validation_params)
clf.fit(X_train, y_train)
    
    
result = permutation_importance(clf, X_train, y_train, n_repeats=10, random_state=42)
perm_sorted_idx = result.importances_mean.argsort()

tree_importance_sorted_idx = np.argsort(clf.feature_importances_)
tree_indices = np.arange(0, len(clf.feature_importances_)) + 0.5

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8))
ax1.barh(tree_indices, clf.feature_importances_[tree_importance_sorted_idx], height=0.7)
ax1.set_yticks(tree_indices)
ax1.set_yticklabels(data.feature_names[tree_importance_sorted_idx])
ax1.set_ylim((0, len(clf.feature_importances_)))
ax2.boxplot(
    result.importances[perm_sorted_idx].T,
    vert=False,
    labels=data.feature_names[perm_sorted_idx],
)
fig.tight_layout()
plt.show()


In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8))
corr = spearmanr(X).correlation

# Ensure the correlation matrix is symmetric
corr = (corr + corr.T) / 2
np.fill_diagonal(corr, 1)

# We convert the correlation matrix to a distance matrix before performing
# hierarchical clustering using Ward's linkage.
distance_matrix = 1 - np.abs(corr)
dist_linkage = hierarchy.ward(squareform(distance_matrix))
dendro = hierarchy.dendrogram(
    dist_linkage, labels=data.feature_names.tolist(), ax=ax1, leaf_rotation=90
)
dendro_idx = np.arange(0, len(dendro["ivl"]))

ax2.imshow(corr[dendro["leaves"], :][:, dendro["leaves"]])
ax2.set_xticks(dendro_idx)
ax2.set_yticks(dendro_idx)
ax2.set_xticklabels(dendro["ivl"], rotation="vertical")
ax2.set_yticklabels(dendro["ivl"])
fig.tight_layout()
plt.show()

In [None]:





mlpX = fullweighted_scaled
# mlpY = y = ctl_miss_ws_cs_mat[1]

mlp = MLPClassifier(solver='adam', activation='logistic',
                    hidden_layer_sizes=mlpX.shape,
                    max_iter=3000, max_fun=225000)

def validate_mlp(estimator,
                 X, y,
                 test_size=0.6,
                 shuffle=True,
                 random_state=0,
                 **kwargs):

    X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                        test_size=0.4,
                                                        shuffle=True,
                                                        stratify=y,
                                                        random_state=None)

#     mlp = MLPClassifier(solver=solver, activation=activation,
#                         hidden_layer_sizes=mlpX.shape,
#                         max_iter=3000, max_fun=225000)

    estimator.fit(X_train, y_train)

#     mlp_y_pred = estimator.predict(X_test)
    acc_score = accuracy_score(y_pred=estimator.predict(X_test), y_true=y_test)

#     cr = pd.DataFrame(classification_report(y_pred=mlp_y_pred,
#                                             y_true=y_test,
#                                             output_dict=True,
#                                             zero_division=1))
    return acc_score

loo_results = pd.Series(((validate_mlp(mlp, X=fws_squared.drop(col,axis=1),
                                       y=sess00.events.iloc[:, -1])
                          for col in tqdm(fws_squared.columns))),
                        index=fws_squared.columns)
# display(*[validate_mlp(mlp, X=mlpX, y=sess00.events.trial_type),
#           validate_mlp(estimator=mlp, X=mlpX, y=sess00.events.recognition_performance),
#           validate_mlp(estimator=mlp, X=mlpX, y=ctl_miss_ws_cs_mat[1])])
# mlp.score(X_train, y_train)

In [None]:
from sklearn import pipeline
help(pipeline)

In [None]:
import os
import skimage
from pathlib import Path
from PIL import Image
from skimage.transform import resize
from sklearn.decomposition import FastICA, PCA # IncrementalPCA
from sklearn.utils import Bunch
from sklearn.feature_extraction.image import extract_patches_2d

impaths = sorted(filter(os.path.isfile, sorted(Path('/data/simexp/fnadeau/wmstim/').iterdir())))
imdict = pd.DataFrame(((StandardScaler().fit_transform(np.uint8(resize(skimage.io.imread(
             apath, as_gray=True), (250, 250), anti_aliasing=True).flatten()
                                                                for apath in impaths)))),
                      index=list(map(str.lower, list(map(os.path.basename, impaths)))))

stimlist = sess00.events.stim_file.fillna('empty_box_gris.bmp').str.lower().tolist()
stimdata = imdict.loc[stimlist]

In [None]:
# pca00 = PCA().fit(mainmat,sess00.events.iloc[:,-1])
# ica00 = FastICA(max_iter=600, tol=0.05).fit(mainmat)

# ica_results00 = ica00.fit_transform(mainmat)
# validate_model(X=ica_results00,y=sess00.events.iloc[:,-1])

In [None]:
loo_results = pd.read_csv('/data/simexp/fnadeau/loo_1024_results.tsv',
                          sep='\t', index_col='difumo_names').rename({'0':'accuracy'}, axis=1)
list(loo_results.sort_values('accuracy', ascending=True).iterrows())

In [None]:
loo_results.to_csv('/data/simexp/fnadeau/loo_1024_results.tsv',
                   sep='\t', encoding='UTF-8-SIG', index='difumo_names')

In [None]:
mlp = MLPClassifier(solver='adam', activation='logistic',
                    hidden_layer_sizes=mlpX.shape,
                    max_iter=3000, max_fun=225000)

def leave_one_out(X,y,n_tests=8,**kwargs):
    return pd.DataFrame(((pd.Series((validate_mlp(estimator=mlp,X=X.drop(col,axis=1), y=y)
                                   for col in X.columns),
                                  index=X.columns)
                          for n in tqdm(range(n_tests)))))
# loss = leave_one_out(X=fullweighted_scaled, y=ctl_miss_ws_cs_mat[1])

In [None]:
# help(FeatureAgglomeration)n_clusters=#fullweighted_scaled


In [None]:
display(*[validate_model(X = fullweighted_scaled,
                        y = enc_ctl_mat[1]),
          validate_model(X = fullweighted_scaled,
                        y = recog_perfo_mat[1]),
          validate_model(X = fullweighted_scaled,
                        y = ctl_miss_ws_cs_mat[1])])

In [None]:
y_all = [sess00.events.trial_type, sess00.events.recognition_performance,
         sess00.events.ctl_miss_ws_cs]
X_train, X_test, y_train, y_test = train_test_split(fullweighted_scaled,
                                                    sess00.events.trial_type,
                                                    test_size=0.7,
                                                    shuffle=True,
                                                    stratify=sess00.events.trial_type,
                                                    random_state=None)
mlp = MLPClassifier(solver='adam', activation='logistic',
                    hidden_layer_sizes=mlpX.shape,
                    max_iter=3000, max_fun=225000)

mlp.fit(X_train,y_train)

round(len(list(filter(None, mlp.predict(X_test)==y_test)))/len(y_test)*100, 2)

In [None]:
from operator import itemgetter

coefs_, intercepts_, loss_curve = itemgetter(*['coefs_','intercepts_','loss_curve_'])(mlp.__dict__) 

In [None]:
fullweighted_scaled.T

In [None]:
# best2 = FeatureAgglomeration().fit_transform(fullweighted_scaled)
# best2
# help(FeatureAgglomeration)
from sklearn.decomposition import IncrementalPCA, PCA, FastICA

pca = PCA().fit(fullweighted_scaled,fullweighted_scaled.index)
ica = FastICA(max_iter=600, tol=0.05).fit(fullweighted_scaled)

# pca_results = pca.fit_transform(fullweighted_scaled.T,fullweighted_scaled.columns) #sess00.events.trial_type)
# pca.shape
# help(FastICA)

In [None]:
pd.DataFrame(pca.components_,index=fullweighted_scaled.index,
             columns=fullweighted_scaled.columns)

In [None]:
pd.DataFrame(ica.components_,index=fullweighted_scaled.index,
             columns=fullweighted_scaled.columns)
ica.feature_names_in_

In [None]:
ica.__dict__

In [None]:
pca.__dict__.keys()
pd.Series(pca.explained_variance_ratio_,index=pca.feature_names_in_)

In [None]:
import sklearn
from sklearn.metrics import pairwise_distances

aggclus = AgglomerativeClustering(n_clusters=16, affinity='euclidean', memory=None,
                                  connectivity=conn, compute_full_tree=True, linkage='ward',
                                  distance_threshold=None, compute_distances=True)

# X_train, X_test, y_train, y_test = train_test_split(X, y,
#                                                         **validation_params)
# aggclus.
mds = MDS(n_components=32, metric=True, n_init=4,
          max_iter=300, eps=0.001, random_state=None,
          dissimilarity='precomputed')
newdims = mds.fit_transform(pairwise_distances(fullweighted_scaled))

from sklearn.covariance import EmpiricalCovariance, MinCovDet
from sklearn.inspection import partial_dependence

empcov = EmpiricalCovariance(store_precision=True, assume_centered=True)

covmat = pd.DataFrame(empcov.fit(cdict2.signal_matrix).covariance_,
                      index=cdict2.signal_matrix.columns,
                      columns=cdict2.signal_matrix.columns)

# help(MDS)
from sklearn.covariance import MinCovDet
from sklearn.inspection import partial_dependence
from sklearn.covariance import MinCovDet
from sklearn.manifold import MDS

from scipy.cluster.hierarchy import dendrogram
from sklearn.cluster import AgglomerativeClustering

from nilearn.connectome import ConnectivityMeasure

cm = ConnectivityMeasure(cov_estimator=MinCovDet(),
                         kind='covariance')

# conn = cm.fit_transform([MinMaxScaler().fit_transform(fullweighted_scaled.values)])

conn = cm.fit_transform([cdict2.signal_matrix.values])
conn
# sns.heatmap(conn[0])
# pd.DataFrame(MinMaxScaler().fit_transform(cdict2.signal_matrix.cov()),
#              index=cdict2.signal_matrix.columns,
#              columns=cdict2.signal_matrix.columns).min().min()
# partdep = partial_dependence(mcd, cdict2.signal_matrix, cdict2.signal_matrix.columns,
#                              response_method='auto', percentiles=(0.05, 0.95),
#                              grid_resolution=cdict2.signal_matrix.shape[0],
#                              method='auto', kind='legacy')

In [None]:
# dims=[64,128,256,512,1024]

# [get_difumo_cut_coords(dim, resolution_mm=3,
#                        data_dir=atlases_dir,
#                        as_dataframe=False,
#                       output_dir='/data/simexp/fnadeau/')
#  for dim in dims]

# fullweighted_scaled.shape
# sess00.events
# label_list = [itm[1] for itm in list(sess00.events[['trial_type','recognition_performance',
#                                                 'ctl_miss_ws_cs']].iteritems())]


In [None]:
difumo = get_difumo(data_dir=atlases_dir, dimension=256, resolution_mm=3)

contrasts = [[NiftiMapsMasker(maps_img=difumo.maps,
                              mask_img=sess00.mask_img,
                              resampling_target='maps').fit(sess00.fmri_img).inverse_transform(
                 fullweighted_scaled.set_index(labels).loc[trial]) for trial in tuple(set(labels))]
             for labels in label_list]
# help(NiftiMapsMasker)

In [None]:
c2 = list(more_itertools.flatten(contrasts))

len(c2)

In [None]:


# ipca00 = IncrementalPCA().fit(X, X.columns)
pca00 = PCA().fit(X, X.columns)
# ipca_table = pd.DataFrame(ipca00.explained_variance_)
pca_table = pd.DataFrame(pca00.explained_variance_ratio_)
# # ipca00.__dict__
# ipca_table = pd.DataFrame(ipca00.explained_variance_ratio_)
# pca_table = pd.DataFrame(pca00.explained_variance_ratio_)
display(pca_table)
sorted(ipca00.__dict__.keys())
# Bunch(**pca00.__dict__)


In [None]:
pca00.singular_values_, pca00.feature_names_in_.shape

In [None]:
from sklearn.feature_selection import SelectKBest

trial_labels = [ev.trial_type for ev in test_glm_events[1:]]
bestof = np.array([SelectKBest().fit(fullweighted, ev).get_feature_names_out()
          for ev in trial_labels])
# [best.shape for best in bestof]
allbest = sorted(set(bestof.flatten()))
allbest

In [None]:
fullweighted[allbest]

In [None]:
sess00.events

In [None]:
ctl_miss_ws_cs_mat[0][allbest]

In [None]:

# # Control Trials Within-Category Correlations Across Labels
# enc_ctl_ctl = enc_ctl_mat[0][allbest].set_index(
#                          sess00.events.trial_type).loc['Ctl'].T.corr()
# recognition_performance_ctl = recog_perfo_mat[0][allbest].set_index(
#                          sess00.events.recognition_performance).loc['Ctl'].T.corr()
# ctl_miss_ws_cs_ctl = ctl_miss_ws_cs_mat[0][allbest].set_index(
#                          sess00.events.ctl_miss_ws_cs).loc['Ctl'].T.corr()

# ctl_mats = [enc_ctl_ctl, recognition_performance_ctl, ctl_miss_ws_cs_ctl]
# [mat.where(mat.values!=np.tril(mat.values), inplace=True) for mat in ctl_mats]

# [mat.where(mat.values!=np.diag(mat.values), inplace=True) for mat in ctl_mats]

# enc_ctl_ctl, recognition_performance_ctl, ctl_miss_ws_cs_ctl = ctl_mats
# # fullweighted_euc = pd.DataFrame(pairwise_distances(fullweighted[allbest]),
# #                                 columns=sess00.events.trial_type)
# sns.set(rc={'figure.figsize': (15,15)})
# display(sns.heatmap(enc_ctl_ctl), sns.heatmap(recognition_performance_ctl),
#         sns.heatmap(ctl_miss_ws_cs_ctl))

In [None]:
import os
import skimage
import imageio
from pathlib import Path
from sklearn.feature_extraction.image import extract_patches_2d

impaths = sorted(filter(os.path.isfile, sorted(Path('/data/simexp/fnadeau/wmstim/').iterdir())))
ctl_stim = '/data/simexp/fnadeau/wmstim/empty_box_gris.bmp'
stimfiles = [apath for apath in impaths if os.path.basename(apath) in
             list(map(str.lower, sess00.events.stim_file.fillna(ctl_stim)))]
# stimfiles

In [None]:
from PIL.Image import antialias

In [None]:

# reshaped = np.array([np.reshape(p, (-1, 2)) for p in patches])
patches = np.array(patches)
euc_imgs = pairwise_distances(patches)
# [p.shape for p in reshaped]

In [None]:
from sklearn.metrics import pairwise_distances
euc_trials = pd.DataFrame(pairwise_distances(X),
                           index=sess00.events.trial_type,
                           columns=fullweighted.columns)

# euc_rois = pd.DataFrame(pairwise_distances(X.T),
#                            columns=X.columns,
#                         index=X.columns)
# euc_vectors.shape, euc_rois.shape
sns.heatmap(euc_trials)

In [None]:
from sklearn.covariance import EmpiricalCovariance
roi_cov = EmpiricalCovariance(assume_centered=False).fit(euc_rois).covariance_
sns.heatmap(roi_cov)

In [None]:
from nilearn.glm.second_level import make_second_level_design_matrix
from nilearn.glm.second_level import SecondLevelModel
first_level_list = list(val.model for val in contrasts_effsiz.values())
first_designs = [glm.design_matrices_[0] for glm in first_level_list]

secondglm = SecondLevelModel().fit(first_level_list,
                                   design_matrix=first_designs)
# design_matrix = make_second_level_design_matrix(subjects_label,
#                                                 extra_info_subjects)

In [None]:
secondglm.generate_report(list(val.contrast_img.get_fdata() for val in contrasts_effsiz.values()))

In [None]:
# sns.set(rc={'figure.figsize': (15,15)})
covmap = X.cov().where(X.cov()!=np.tril(X.cov().values))
covmap = covmap.where(covmap!=np.diag(covmap.values))
covmap = covmap.where(covmap.values!=0)
covmap = covmap.dropna(axis=1,how='all').dropna(axis=0,how='all')
centers = pd.Series([(row[1].dropna().quantile(0.1),
                      row[1].dropna().quantile(0.9))
                     for row in covmap.iterrows()],
                    index=covmap.index)
# sns.heatmap(covmap.where(~covmap.isin(centers.values)))
# highest = covmap.where([~row[1].between(*centers.loc[row[0]])
#                        for row in covmap.iterrows()])
# sns.heatmap(covmap)

In [None]:
selected = selector.fit(X, X.index).get_feature_names_out()

In [None]:
X = pd.DataFrame(StandardScaler().fit_transform(fullweighted),
                 columns=fullweighted.columns,
                 index=fullweighted.index)

y = enc_ctl_mat[1]

selector = RFE(estimator=svc,
               n_features_to_select=0.2)
# selector.fit(X).get_feature_names_out()
validation_params00 = dict(test_size=0.4, shuffle=True,
                           stratify=wholemat.index, random_state=None)

X_train00, X_test00, y_train00, y_test00 = train_test_split(X, y,
                                                            **validation_params00)

ovrc.fit(X_train00, y_train00)

y_pred_test = ovrc.predict(X_test00)

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

conf_mat = confusion_matrix(y_true=y_test00, y_pred=y_pred_test,
                            labels=y.unique())
# y_pred = cross_val_predict(svc, X_test00, y_test00,
#                            groups=y_test00, cv=5)
acc = accuracy_score(y_pred=y_pred_test, y_true=y_test00)
cr_test = pd.DataFrame(classification_report(y_pred=y_pred_test,
                                             y_true=y_test00,
                                             output_dict=True))

display(cr_test)

In [None]:

ConfusionMatrixDisplay(conf_mat).plot()

In [None]:
[{trial: np.array(test_glm_event.trial_type==trial).astype(int)
 for trial in test_glm_event.trial_type}
 for test_glm_event in test_glm_events]

In [None]:
imdict

In [None]:
vistemp_rois = difumo_cut_coords.set_index('difumo_names').T.filter(regex='visual|occip|temp').columns
vistemp_rois

In [None]:
dot_trials = test_signals_matrix.dot(
                 test_signals_matrix.iloc[:-1,:].T).iloc[:-1,:].set_index(
                     sess00.events.trial_type).set_axis(sess00.events.trial_type, axis=1)
axis_range = range(int(round(dot_trials.mean().min(), 0)),
                   int(round(dot_trials.mean().max(), 0)))

def autocorr_plot(inpt):
    import statsmodels.tsa.api as smt
    acf = smt.graphics.plot_acf(inpt, lags=40 , alpha=0.05)
    acf.show()

autocorr_plot(dot_trials.mean())
# sns.regplot(dot_trials.mean(), dot_trials.mean())
# dot_trials.plot.scatter(x=dot_trials.index, y=axis_range)
# sns.heatmap(dot_trials.where(dot_trials!=np.tril(dot_trials.values)))
# sns.scatterplot(dot_trials.mean())

In [None]:
enc_ctl = test_signals_matrix[vistemp_rois].iloc[:-1].set_index(sess00.events.trial_type)
recog_perfo = test_signals_matrix[vistemp_rois].iloc[:-1].set_index(sess00.events.recognition_performance)
ctl_miss_ws_cs = test_signals_matrix[vistemp_rois].iloc[:-1].set_index(sess00.events.ctl_miss_ws_cs)

scaled = pd.DataFrame(StandardScaler().fit_transform(ctl_miss_ws_cs.values),
                      columns=enc_ctl.columns, index=ctl_miss_ws_cs.index)

svc = SVC(tol=0.05,
          kernel='linear',
          cache_size=500,
          decision_function_shape='ovr',
          class_weight='balanced',
          probability=True)
ovrc = OneVsRestClassifier(svc)
selector = RFE(estimator=svc,
               n_features_to_select=0.5)

simple_signals_matrix
enc_ctl_names = selector.fit(enc_ctl, enc_ctl.index).get_feature_names_out()
recog_perfo_names = selector.fit(recog_perfo, recog_perfo.index).get_feature_names_out()
ctl_miss_ws_cs_names = selector.fit(ctl_miss_ws_cs, ctl_miss_ws_cs.index).get_feature_names_out()

In [None]:
SVC(zero_division=0)

In [None]:
features = enc_ctl_names.tolist()+recog_perfo_names.tolist()+ctl_miss_ws_cs_names.tolist()
features = set(features)

In [None]:
features_list = [enc_ctl[features], recog_perfo[features], ctl_miss_ws_cs[features]]
[itm.shape for itm in features_list]

In [None]:
features_list[2]

In [None]:
recog_perfo_scores = []
rpmatrix = recog_perfo[recog_perfo_names]
for col in rpmatrix.columns:
    X, y = rpmatrix[[col]], rpmatrix.index.tolist()
    X_train00, X_test00, y_train00, y_test00 = train_test_split(X, y,
                                                                test_size=0.4,
                                                                shuffle=True,
                                                                random_state=None,
                                                                stratify=rpmatrix.index)

    ovrc.fit(X_train00, y_train00)

    y_pred_test = ovrc.predict(X_test00)
    # y_pred = cross_val_predict(svc, X_test00, y_test00,
    #                            groups=y_test00, cv=5)
    acc = accuracy_score(y_pred=y_pred_test, y_true=y_test00)
    cr_test = pd.DataFrame(classification_report(y_pred=y_pred_test,
                                                 y_true=y_test00,
                                                 output_dict=True,
                                                 zero_division=0))
    recog_perfo_scores.append(acc)
rp_acc = pd.Series(data=recog_perfo_scores,
                   name='rp_acc',
                   index=rpmatrix.columns).sort_values(ascending=False)
rp_acc

In [None]:

total_scores = []
for label_type in features_list:
    matrix = label_type
    perfo_list = []
    for col in matrix.columns:
        X, y = matrix[[col]], matrix.index.tolist()
        X_train00, X_test00, y_train00, y_test00 = train_test_split(X, y,
                                                                    test_size=0.4,
                                                                    shuffle=True,
                                                                    random_state=None,
                                                                    stratify=matrix.index)

        ovrc.fit(X_train00, y_train00)

        y_pred_test = ovrc.predict(X_test00)
        # y_pred = cross_val_predict(svc, X_test00, y_test00,
        #                            groups=y_test00, cv=5)
        acc = accuracy_score(y_pred=y_pred_test, y_true=y_test00)
        cr_test = pd.DataFrame(classification_report(y_pred=y_pred_test,
                                                     y_true=y_test00,
                                                     output_dict=True,
                                                     zero_division=0))
        perfo_list.append(acc)
    total_scores.append(perfo_list)
#     perfo_list.append(f'ROI: {col}\n\nTest Accuracy\n{acc}\n\nTest Classification Report{cr_test}')


In [None]:
accdf = pd.DataFrame(total_scores, columns=features_list[0].columns,
                     index=['enc_ctl', 'recog_perfo',
                            'ctl_miss_ws_cs']).T.sort_values('recog_perfo',
                                                             ascending=False)
accdf
# np.array(total_scores).shape

In [None]:


def roi_based_score(X, y, test_size=0.4, shuffle=True,
                    stratify=None, random_state=None):
    scores = []
    validation_params = dict(test_size=test_size, shuffle=shuffle,
                             stratify=stratify, random_state=random_state)
    for col in X.columns:

        X_transformed = StandardScaler().fit_transform(X)
        X_train, X_test, y_train, y_test = train_test_split(X_transformed,
                                                            y.values.tolist(),
                                                            **validation_params)
        svc = SVC(tol=0.01, kernel='linear',
                  cache_size=5,
                  decision_function_shape='ovo',
                  class_weight='balanced',
                  probability=True).fit(X_train, y_train)
        ovoc = OneVsOneClassifier(svc)
        # Cross-Validation
#         cv = pd.Series(y_train).value_counts().min()
#         y_pred_train = cross_val_predict(ovoc, X_train, y_train,
#                                            groups=y_train, cv=cv)
#         # Scores
#         cv_acc_train = cross_val_score(ovoc, X_train, y_train,
#                                          groups=y_train, cv=cv)
#         overall_acc = accuracy_score(y_pred=y_pred_train, y_true=y_train)
        # Testing Performance
        ovoc.fit(X_train, y_train)
        y_pred_test = ovoc.predict(X_test)
        acc_test = ovoc.score(X_test, y_test) # get accuracy
        scores.append(acc_test)
    return pd.DataFrame(scores, index=X.columns,
                        columns=['Accuracy'])

contrasts_effsiz = get_contrasts(session=sess00,
                                 masker=spheres_masker,
                                 output_type='effect_size',
                                 glm_kws={'signal_scaling': False,
#                                           'high_pass': 0.17,
                                          'standardize': True},
                                 trial_type_col='ctl_miss_ws_cs',
                                 labels=difumo_cut_coords.difumo_names)

# result_roi = roi_based_score(X=contrasts_effsiz.signals.iloc[:-1, :],
#                              y=sess00.events.trial_type)


In [None]:
enc_ctl = contrasts_effsiz.signals.iloc[:-1,:].set_index(sess00.events.trial_type)
recog_perfo = contrasts_effsiz.signals.iloc[:-1,:].set_index(sess00.events.recognition_performance)
ctl_miss_ws_cs = contrasts_effsiz.signals.iloc[:-1,:].set_index(sess00.events.ctl_miss_ws_cs)

scaled = pd.DataFrame(StandardScaler().fit_transform(ctl_miss_ws_cs.values),
                      columns=enc_ctl.columns, index=ctl_miss_ws_cs.index)



In [None]:

# import seaborn as sns
# mask = enc_ctl.cov().where(enc_ctl.cov() == np.triu(enc_ctl.cov().values))
# mask = mask.where(mask.values!=np.diag(mask.values))
# mask = mask.where(mask.values!=0)

# centers = pd.Series([(row[1].dropna().quantile(0.1),
#                       row[1].dropna().quantile(0.9))
#                      for row in mask.iterrows()],
#                     index=mask.index)
# highest = mask.where([~row[1].between(*centers[row[0]])
#                       for row in mask.iterrows()])

# sns.set(rc={'figure.figsize': (15,15)})
# sns.heatmap(highest, square=True)
# skip_diag_strided(mask)
# sns.heatmap(data=enc_ctl.cov(), mask=enc_ctl.cov().where(enc_ctl.cov()==np.triu(enc_ctl.cov().values)))
# mask=enc_ctl.cov().where()

In [None]:
# enc_ctl.plot.scatter(y=enc_ctl.index, x=enc_ctl.columns)
# sns.scatterplot(data=)
from sklearn.preprocessing import MaxAbsScaler, MinMaxScaler


minmax_scaled = pd.DataFrame(MinMaxScaler().fit_transform(enc_ctl.values),
                      columns=enc_ctl.columns, index=enc_ctl.index)
maxabs_scaled = pd.DataFrame(MaxAbsScaler().fit_transform(enc_ctl.values),
                      columns=enc_ctl.columns, index=enc_ctl.index)

sns.set(rc={'figure.figsize': (10,10)})
# sns.scatterplot(data=scaled.groupby(scaled.index).mean())
# sns.scatterplot(data=maxabs_scaled.values)
# fig, ax = plt.subplots(4, 8, sharex='col', sharey='row')
# [scaled.T[col].plot.scatter() for col in scaled.T.columns]
# [sns.scatterplot(data=row[1], figure=fig, ax=ax) for row in scaled.T.iterrows()]
# mask.where(~np.array([row[1].between(-1.9, 0.2)
#             for row in mask.iterrows()]))
# mask.describe()#.max()-mask.min()
# [row[1].dropna().max()-row[1].dropna().min()
#  for row in mask.iterrows()]




In [None]:
X00 = enc_ctl[best_of]
y00 = maxabs_scaled.index
# X00, y00 = signal_table00, signal_table00.index
from sklearn.feature_selection import VarianceThreshold
from sklearn.feature_selection import RFECV

validation_params00 = dict(test_size=0.3, shuffle=True,
                           stratify=y00, random_state=None)

X_transformed00 = StandardScaler().fit_transform(X00)

X_train00, X_validate00, y_train00, y_validate00 = train_test_split(X_transformed00, y00,
                                                                    **validation_params00)
cv = y_train00.value_counts().min()

svc = SVC(tol=0.05,
                              kernel='poly',
                              cache_size=500,
                              decision_function_shape='ovr',
                              class_weight='balanced',
                              probability=True)

svc.fit(X_train00, y_train00)
accuracy_score(y_pred=svc.predict(X_validate00), y_true=y_validate00)
cr_test = pd.DataFrame(classification_report(y_pred=y_pred_test00,
                                               y_true=y_validate00,
                                               output_dict=True))
# cross_val_predict(svc, X_train00, y_train00,
#                                    groups=y_train00, cv=cv)

In [None]:


# Cross-Validation
y_pred_train00 = cross_val_predict(ovoc, X_train00, y_train00,
                                   groups=y_train00, cv=cv)
# Scores
cv_acc_train00 = cross_val_score(ovoc, X_train00, y_train00,
                                 groups=y_train00, cv=cv)

# Overall model performance evaluation on training data
overall_acc00 = accuracy_score(y_pred=y_pred_train00, y_true=y_train00)
overall_cr00 = pd.DataFrame(classification_report(y_pred=y_pred_train00,
                                                  y_true=y_train00,
                                                  output_dict=True))
# Testing Performance
ovoc.fit(X_train00, y_train00)
y_pred_test00 = ovoc.predict(X_validate00)
acc_test00 = ovoc.score(X_validate00, y_validate00) # get accuracy
acc_test200 = accuracy_score(y_pred=y_pred_test00, y_true=y_validate00)
cr_test00 = pd.DataFrame(classification_report(y_pred=y_pred_test00,
                                               y_true=y_validate00,
                                               output_dict=True))

percent_train_ok = round((len(list(filter(None, y_pred_train00==y_train00))) /
                          len(y_train00))*100, 2)

percent_ok = round((len(list(filter(None, y_pred_test00==y_validate00))) /
                    len(y_validate00))*100, 2)

train_report00 = '\n\n'.join(['Training Performance Evaluation',
                              f'Training Predictions: {y_pred_train00}',
                              f'Training True Labels: {y_train00.values}',
                              f'Training % Correct: {percent_train_ok}',
                              f'Cross-Validation Accuracy:\n{cv_acc_train00}',
                              f'Overall Accuracy:\n{overall_acc00}',
                              f'Overall Classification Report:\n{overall_cr00}'])

validate_report00 = '\n\n'.join(['Test Performance Evaluation',
                                 f'Predictions: {y_pred_test00}',
                                 f'True Labels: {y_validate00.values}',
                                 f'% Correct: {percent_ok}',
                                 f'Test Score: {acc_test00}',
                                 f'Test Accuracy: {acc_test200}',
                                 f'Classification Report\n{cr_test00}'])

from sklearn.model_selection import cross_validate
from sklearn.metrics import recall_score
scoring = ['precision_macro', 'recall_macro']
scoring200 = ['explained_variance', 'r2']
scores00 = pd.DataFrame(cross_validate(ovoc, X00, y00,
                                       scoring=scoring))

print(f'{scores00}\n\n{train_report00}\n{validate_report00}')

In [None]:
enc_ctl.cov().where(enc_ctl.cov()==np.triu(enc_ctl.cov().values))

In [None]:
covmat = contrasts03_test.signals.cov().values

def skip_diag_strided(A):
    m = A.shape[0]
    strided = np.lib.stride_tricks.as_strided
    s0,s1 = A.strides
    return strided(A.ravel()[1:], shape=(m-1,m), strides=(s0+s1,s1)).reshape(m,-1)

pd.DataFrame(skip_diag_strided(np.tril(covmat))).max()

In [None]:
dual_X_train = np.array(X_train00.tolist()+X_train01.tolist())
dual_y_train = np.array(y_train00.tolist()+y_train01.tolist())
dual_X_validate = np.array(X_validate00.tolist()+X_train01.tolist())
dual_y_validate = np.array(y_validate00.tolist()+y_validate01.tolist())

In [None]:
# ovoc.fit(X_train01, y_train01)
# dual_X_validate.shape,dual_y_validate.shape
(dual_X_train.shape, dual_y_train.shape), dual_X_validate.shape, dual_y_validate.shape

In [None]:
ovoc.fit(dual_X_train, dual_y_train)

In [None]:
len(list(filter(None, ovoc.predict(X_validate00)==y_validate00)))/len(y_validate00)

In [None]:
import seaborn as sns
outcome_means = signal_table.groupby(signal_table.index).mean()

In [None]:
outcome_means.boxplot()

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm


def make_meshgrid(x, y, h=0.02):
    """
    Create a mesh of points to plot in

    Parameters
    ----------
    x: data to base x-axis meshgrid on
    y: data to base y-axis meshgrid on
    h: stepsize for meshgrid, optional

    Returns
    -------
    xx, yy : ndarray
    """
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    return xx, yy


def plot_contours(ax, clf, xx, yy, **params):
    """
    Plot the decision boundaries for a classifier.

    Parameters
    ----------
    ax: matplotlib axes object
    clf: a classifier
    xx: meshgrid ndarray
    yy: meshgrid ndarray
    params: dictionary of params to pass to contourf, optional
    """
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out

# X = signal_table
# y = signal_table.index

plot_contours()

In [None]:


# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
C = 1.0  # SVM regularization parameter
models = (
    svm.SVC(kernel="linear", C=C),
    svm.LinearSVC(C=C, max_iter=10000),
    svm.SVC(kernel="rbf", gamma=0.7, C=C),
    svm.SVC(kernel="poly", degree=3, gamma="auto", C=C),
)
models = (clf.fit(X, y) for clf in models)

# title for the plots
titles = (
    "SVC with linear kernel",
    "LinearSVC (linear kernel)",
    "SVC with RBF kernel",
    "SVC with polynomial (degree 3) kernel",
)

# Set-up 2x2 grid for plotting.
fig, sub = plt.subplots(2, 2)
plt.subplots_adjust(wspace=0.4, hspace=0.4)

X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)

for clf, title, ax in zip(models, titles, sub.flatten()):
    plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
    ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors="k")
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xlabel("Sepal length")
    ax.set_ylabel("Sepal width")
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)

plt.show()

In [None]:
from sklearn.decomposition import PCA
pca00=PCA().fit(signal_table.T)
pca_table = pd.DataFrame(pca00.explained_variance_ratio_)
#                          index=signal_table.columns,
#                          columns=['var_ratio']).sort_values(
#                              'var_ratio',ascending=False)
pca_table

In [None]:
sorted(ovoc.__dict__['estimators_'][0].__dir__())
# ovoc.__dict__['estimators_'][0].predict_proba(X_validate)
# ovoc.__dict__['estimator'].
# sklearn.metrics.SCORERS['explained_variance'](ovoc, X_validate, y_validate)

In [None]:




ovoc = OneVsOneClassifier(SVC(tol=0.0001, kernel='poly',
                              cache_size=600,
                              decision_function_shape='ovo',
                              class_weight='balanced',
#                               break_ties=True,
#                               gamma='auto', degree=3, coef0=0.0,
                              probability=True))

ovoc_pipe = make_pipeline(scaler, ovoc)

ovo_cv_acc = cross_val_score(ovoc, X_test, y_test,
                             groups=y_test, cv=cv)


# .fit(X_train_transformed, y_train)

ovoc_score = ovoc.score(X_test_transformed, y_test)
# cv = sess00.events.ctl_miss_ws_cs.value_counts().min() -1


# ovrc = OneVsRestClassifier(SVC(tol=0.0001, kernel='poly',
#                                cache_size=600,
#                                decision_function_shape='ovr',
#                                class_weight='balanced',
#                                break_ties=True,
# #                                gamma='auto', degree=3, coef0=0.0,
#                                probability=True))



# ovrc.fit(X_train, y_train), ovoc.fit(X_train, y_train)

ovo_cr = classification_report(y_pred=ovoc.predict(X_test),
                               y_true=y_test, zero_division=0)
ovr_cv_acc = cross_val_score(ovrc, X_test, y_test,
                              groups=y_test, cv=cv)
# ovr_cr = classification_report(y_pred=ovrc.predict(X_test),
#                                y_true=y_test, zero_division=0)




print('\n\n'.join([f'Training: {len(y_train)}, Testing: {len(y_test)}',
                   f'True Labels:\n{y_test.values}',
                   f'OVR Predicted Labels:\n{ovrc.predict(X_test)}',
                   f'OVO Predicted Labels:\n{ovoc.predict(X_test)}',                   
                   ', '.join([f'OVR Accuracy: {ovrc.score(X_test,y_test)}',
                              f'OVO Accuracy: {ovoc.score(X_test,y_test)}']),
                   f'OVR Cross-Validation:\n{ovr_cv_acc}',
                   f'OVO Cross-Validation:\n{ovo_cv_acc}',
                   f'OVR CR:\n{ovr_cr}',
                   f'OVO CR:\n{ovo_cr}']))



In [None]:
s_data = [sess00.sub_id]

# define the model


# s_data.append(overall_acc)

# Test model on unseen data from the test set
ovoc.fit(X_train, y_train)
y_pred_test = ovoc.predict(X_test) # classify age class using testing data
acc_test = sub_svc.score(X_test, y_test) # get accuracy

cr_test = classification_report(y_pred=y_pred_test, y_true=y_test) # get prec., recall & f1
# print results
print('\n\n'.join(['Test Performance Evaluation',
                   f'accuracy: {acc_test}',
                   f'Classification Report\n{cr_test}']))

s_data.append(acc)

# get map of coefficients    

#Save .nii to file
# coef_img.to_filename(os.path.join(output_dir, 'Coef_maps', 'SVC_coeff_enc_ctl_sub-'+str(sub)+'.nii'))

# enc_ctl_data = enc_ctl_data.append(pd.Series(s_data, index=enc_ctl_data.columns), ignore_index=True)

# demo_data = sub_data.copy()
# demo_data.reset_index(level=None, drop=False, inplace=True)

# enc_ctl_data.insert(loc = 1, column = 'cognitive_status', value = demo_data['cognitive_status'],
#                 allow_duplicates=True)
# enc_ctl_data.insert(loc = 2, column = 'total_scrubbed_frames', value = demo_data['total_scrubbed_frames'],
#                 allow_duplicates=True)
# enc_ctl_data.insert(loc = 3, column = 'mean_FD', value = demo_data['mean_FD'], allow_duplicates=True)
# enc_ctl_data.insert(loc = 4, column = 'hits', value = demo_data['hits'], allow_duplicates=True)
# enc_ctl_data.insert(loc = 5, column = 'miss', value = demo_data['miss'], allow_duplicates=True)
# enc_ctl_data.insert(loc = 6, column = 'correct_source', value = demo_data['correct_source'],
#                 allow_duplicates=True)
# enc_ctl_data.insert(loc = 7, column = 'wrong_source', value = demo_data['wrong_source'], allow_duplicates=True)
# enc_ctl_data.insert(loc = 8, column = 'dprime', value = demo_data['dprime'], allow_duplicates=True)
# enc_ctl_data.insert(loc = 9, column = 'associative_memScore', value = demo_data['associative_memScore'],
#                 allow_duplicates=True)    

# enc_ctl_data.to_csv(os.path.join(output_dir, 'SVC_withinSub_enc_ctl_wholeBrain.tsv'),
# sep='\t', header=True, index=False)

In [None]:
coef_ = sub_svc.coef_
# print(coef_.shape)
#Return voxel weights into a nifti image using the NiftiMasker
coef_img = spheres_masker.inverse_transform(coef_)

In [None]:
from sklearn.model_selection import ShuffleSplit
X, y = sigs00, sess00.events.ctl_miss_ws_cs
# print(f'Samples: {X.shape}, Labels: {y.shape}')
n_samples = X_train.shape[0]

cv = ShuffleSplit(n_splits=5, test_size=0.3, random_state=0)

scores = cross_val_score(ovrc, X, y, cv=cv)

print(f'Mean CV Accuracy Score: {scores.mean().round(3)}, STD = {round(scores.std(), 3)}')



In [None]:
cross_val_predict(ovrc, X_train, y_train, groups=y_train,
                  cv=10)

In [None]:
# ENCODING VERSUS CONTROL TASK CLASSIFICATION

# s_data = []

# for i in range(1, difumo.labels.shape[0]):
#     enc_ctl_data.insert(loc = enc_ctl_data.shape[1],
#                         column = roi_names[i]+'_coef',
#                         value = NaN, allow_duplicates=True)

#     betas, sub_mask = alltrials_betas_A, sess00.mask_img
#     # initialize NiftiLabelMasker object    
#     sub_label_masker = NiftiLabelsMasker(labels_img=basc,
#                                          standardize=True,
#                                          mask_img=sub_mask,
#                                          verbose=0)

#     # transform subject's beta maps into vector of network means per trial
#     X_enc_ctl = sub_label_masker.fit_transform(betas)

#     # load subject's trial labels
#     y_enco_ctl = sess00.events.trial_type
#     # Split trials into a training and a test set
#     X_train, X_test, y_train, y_test = train_test_split(
#         X_enc_ctl, # x
#         y_enco_ctl, # y
#         test_size = 0.4, # 60%/40% split
#         shuffle = True, # shuffle dataset before splitting
#         stratify = y_enco_ctl, # keep distribution of conditions consistent betw. train & test sets
#         #random_state = 123  # if set number, same shuffle each time, otherwise randomization algo
#         ) 
#     print('\n\n'.join(['Training Set',
#                        f'Length: {len(X_train)}',
#                        f'Labels: {y_train.value_counts()}',
#                        'Testing Set',
#                        f'Length: {len(X_test)}',
#                        f'Labels: {y_test.value_counts()}']))

#     # define the model
#     sub_svc = SVC(kernel='linear', class_weight='balanced')

#     # Cross-validation to evaluate model performance
#     # within 10 folds of training set
#     # predict
#     y_pred = cross_val_predict(sub_svc, X_train, y_train,
#                                groups=y_train, cv=cv)
#     # scores
#     cv_acc = cross_val_score(sub_svc, X_train, y_train,
#                              groups=y_train, cv=cv)
#     print(f'Cross-Validation Accuracy Score: {cv_acc}')

#     for i in range(0, len(cv_acc)):
#         s_data.append(cv_acc[i])

#         # evaluate overall model performance on training data
#         overall_acc = accuracy_score(y_pred = y_pred, y_true = y_train)
#         overall_cr = classification_report(y_pred = y_pred, y_true = y_train)

#     #     s_data.append(overall_acc)
#         # Test model on unseen data from the test set
#         sub_svc.fit(X_train, y_train)
#         y_pred = sub_svc.predict(X_test) # classify age class using testing data
#         acc = sub_svc.score(X_test, y_test) # get accuracy
#         cr = classification_report(y_pred=y_pred, y_true=y_test) # get prec., recall & f1
#         s_data.append(acc)

#         # get coefficients
#         coef_ = sub_svc.coef_[0]
#         print('\n\n'.join([f'Accuracy: {acc}',
#                            f'Classification Report: {cr}',
#                            f'Overall Accuracy Score: {overall_acc}',
#                            f'Overall Classification Report: {overall_cr}',
#                            f'Number of Coefficients: {coef_.shape}',
#                            f'Coefficients: {coef_}']))
#         sub_basc = basc_labels.copy()
#         sub_basc.insert(loc=3, column='coef', value=coef_,
#                         allow_duplicates=True)

#     coef = sub_basc['coef']
#     for j in range(0, len(coef)):
#         s_data.append(coef[j])

#     sub_basc.sort_values(by='coef', axis = 0, ascending = False, inplace=True)
#     #print(sub_basc.iloc[0:12, 2:4])

#     enc_ctl_data = enc_ctl_data.append(pd.Series(s_data, index=enc_ctl_data.columns),
#                                        ignore_index=True)


demo_data = sub_data.copy()
demo_data.reset_index(level=None, drop=False, inplace=True)

enc_ctl_data.insert(loc = 1, column = 'cognitive_status',
                    value = demo_data['cognitive_status'], allow_duplicates=True)
enc_ctl_data.insert(loc = 2, column = 'total_scrubbed_frames',
                    value = demo_data['total_scrubbed_frames'], allow_duplicates=True)
enc_ctl_data.insert(loc = 3, column = 'mean_FD',
                    value = demo_data['mean_FD'], allow_duplicates=True)
enc_ctl_data.insert(loc = 4, column = 'hits', value = demo_data['hits'], allow_duplicates=True)
enc_ctl_data.insert(loc = 5, column = 'miss', value = demo_data['miss'], allow_duplicates=True)
enc_ctl_data.insert(loc = 6, column = 'correct_source',
                    value = demo_data['correct_source'], allow_duplicates=True)
enc_ctl_data.insert(loc = 7, column = 'wrong_source',
                    value = demo_data['wrong_source'], allow_duplicates=True)
enc_ctl_data.insert(loc = 8, column = 'dprime',
                    value = demo_data['dprime'], allow_duplicates=True)
enc_ctl_data.insert(loc = 9, column = 'associative_memScore',
                    value = demo_data['associative_memScore'], allow_duplicates=True)    

# enc_ctl_data.to_csv(os.path.join(output_dir, 'SVC_withinSub_enc_ctl_'+str(numnet)+'networks.tsv'),
#     sep='\t', header=True, index=False)


In [None]:
# sess00.events.recognition_performance.unique(),sess00.events.ctl_miss_ws_cs.unique()
# false_alarms = sess00.behav.where(sess00.behav.values=='FA').dropna(
#                    axis=1,how='all').dropna(axis=0,how='all').index
# sess00.behav.loc[false_alarms]
# sess00.behav.recognition_performance.unique()
# 'CR' in sess00.events.values

In [None]:
help(cross_val_score)

In [None]:
# from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
# from sklearn.model_selection import train_test_split
# from nilearn.input_data import NiftiSpheresMasker
# from sklearn.model_selection import cross_val_predict, train_test_split
# from sklearn.model_selection import cross_val_score, KFold
# from sklearn.metrics import accuracy_score, classification_report
# from sklearn.model_selection import GridSearchCV
# from sklearn.metrics import classification_report
# from sklearn.svm import SVC, LinearSVC
# # help(OneVsRestClassifier)

# cv = sess00.events.ctl_miss_ws_cs.value_counts().min() -1
# X_train, X_test, y_train, y_test = train_test_split(
#     sigs00,
#     sess00.events.ctl_miss_ws_cs,
#     test_size = 0.9,
#     shuffle = True,
#     stratify = sess00.events.recognition_performance)

# ovrc = OneVsRestClassifier(SVC(tol=0.0001, kernel='poly',
#                                cache_size=600,
#                                decision_function_shape='ovr',
#                                class_weight='balanced',
#                                break_ties=True,
# #                                gamma='auto', degree=3, coef0=0.0,
#                                probability=True))

# ovoc = OneVsOneClassifier(SVC(tol=0.0001, kernel='poly',
#                               cache_size=600,
#                               decision_function_shape='ovo',
#                               class_weight='balanced',
# #                               break_ties=True,
# #                               gamma='auto', degree=3, coef0=0.0,
#                               probability=True))

# ovrc.fit(X_train, y_train), ovoc.fit(X_train, y_train)

# ovo_cr = classification_report(y_pred=ovoc.predict(X_test),
#                                y_true=y_test, zero_division=0)
# ovr_cr = classification_report(y_pred=ovrc.predict(X_test),
#                                y_true=y_test, zero_division=0)

# ovr_cv_acc = cross_val_score(ovrc, X_test, y_test,
#                               groups=y_test, cv=cv)
# ovo_cv_acc = cross_val_score(ovoc, X_test, y_test,
#                              groups=y_test, cv=cv)

# print('\n\n'.join([f'Training: {len(y_train)}, Testing: {len(y_test)}',
#                    f'True Labels:\n{y_test.values}',
#                    f'OVR Predicted Labels:\n{ovrc.predict(X_test)}',
#                    f'OVO Predicted Labels:\n{ovoc.predict(X_test)}',                   
#                    ', '.join([f'OVR Accuracy: {ovrc.score(X_test,y_test)}',
#                               f'OVO Accuracy: {ovoc.score(X_test,y_test)}']),
#                    f'OVR Cross-Validation:\n{ovr_cv_acc}',
#                    f'OVO Cross-Validation:\n{ovo_cv_acc}',
#                    f'OVR CR:\n{ovr_cr}',
#                    f'OVO CR:\n{ovo_cr}']))
#               #\nPredictions:\n{ovoc.predict(X_test)}']))
# # help(SVC)

In [None]:
from nilearn.decoding import FREMRegressor
# help(FREMRegressor)
X_train, X_test, y_train, y_test = train_test_split(
    sigs00,
    sess00.events.trial_type,
    test_size = 0.4,
    shuffle = True,
    stratify = enc_ctl00)

frem = FREMRegressor(ovrc).fit(X_train,y_train)

# frem.fit(X_train, y_train)

print(frem.predict(X_test))

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    sigs00, # x
    sess00.events.trial_type, # y
    test_size = 0.4, # 60%/40% split
    shuffle = True, # shuffle dataset before splitting
    stratify = enc_ctl00, # keep distribution of conditions consistent betw. train & test sets
    #random_state = 123  # if set number, same shuffle each time, otherwise randomization algo
    )
# define the model
# kernel='linear', class_weight='balanced'
sub_svc = SVC()
cv = 15
sub_svc.fit(X_train, y_train)
y_pred = sub_svc.predict(X_test)
from sklearn.metrics import accuracy_score
cv_acc = cross_val_score(sub_svc, X_test, y_test,
                         groups=y_test, cv=cv)

cr = classification_report(y_pred=y_pred, y_true=y_test)
print('\n\n'.join([f'Cross-Validation Score: {cv_acc}',
                   f'Classification Report: {cr}']))
# print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

In [None]:
# from cimaq_decoding_utils import get_iso_labels
# enc_ctl_iso00 = get_iso_labels(session=sess00)
# sigs_iso00 = NiftiMasker().fit_transform(sess00.cleaned_fmri)

In [None]:
from operator import itemgetter
from typing import Iterable, Sequence


def flatten(nested_seq: Union[Iterable, Sequence]) -> list:
    """
    Return vectorized (1D) list from nested Sequence ``nested_seq``.
    """

    return [bottomElem for sublist in nested_seq for bottomElem
            in (flatten(sublist)
                if (isinstance(sublist, Sequence)
                    and not isinstance(sublist, str))
                else [sublist])]


def get_iso_labels(frame_times:Iterable=None,
                   t_r:float=None,
                   events:pd.DataFrame=None,
                   session:Union[dict, Bunch]=None,
                   **kwargs) -> list:
    """
    Return a list of which trial condition a given fMRI frame fits into.
    
    Args:
        frame_times: Iterable (default=None)
            Iterable array containing the onset of each fMRI frame
            since scan start.

        t_r: float (default=None)
            fMRI scan repetition time, in seconds.

        events: pd.DataFrame (default=None)
            DataFrame containing experimental procedure details.
            The required columns are the same as used by Nilearn functions
            (e.g. ``nilearn.glm.first_level.FirstLevelModel``).
            Those are ["onset", "duration", "trial_type"].
            Other columns are ignored.

        session: dict or Bunch (default=None)
            Dict or ``sklearn.utils.Bunch`` minimally containg
            all of the above parameters just like keyword arguments.

    Returns: list
        List of which trial condition a given fMRI frame fits into.
        The list is of the same lenght as ``frame_times``,
        suitable for classification and statistical operations.
    """
#     from cimaq_decoding_utils import flatten
    if session is not None:
        params = itemgetter(*('frame_times','t_r','events'))(session)
        frame_times,t_r,events = params

    frame_intervals = [pd.Interval(*item) for item in
                       tuple(zip(frame_times, frame_times+t_r))]
    trial_ends = (events.onset+events.duration).values
    trial_intervals = [pd.Interval(*item) for item in
                       tuple(zip(events.onset.values, trial_ends))]
    frame_intervals = [frame for frame in frame_intervals
                       if frame.left>trial_intervals[0].right]
    frame_intervals = [frame for frame in frame_intervals
                       if trial_intervals[-1].left<frame.left]    
#     return len(frame_intervals)

    bold_by_trial_indx = [[frame[0] for frame in
                           enumerate(frame_intervals)
                           if frame[1].right in trial]
                          for trial in trial_intervals]
#     return bold_by_trial_indx
#     labels = pd.Series([[label] for label in events.trial_type.tolist()])
    label_shapes = list(map(len,bold_by_trial_indx))
    return sum(label_shapes)
#     return np.array(flatten((labels*label_shapes).values))
    
#     [labels.extend(lst) for lst in
#     return np.array(flatten([[item[0]]*len(item[1]) for item in
#                      tuple(zip(events.trial_type.values,
#                                bold_by_trial_indx))]))
#     return np.array(labels)

# enc_ctl_iso00 = 
get_iso_labels(session=sess00)

# enc_ctl_iso00, enc_ctl_iso00.shape

In [None]:
enc_ctl00 = sess00.events.trial_type.values

In [None]:

# Cross-validation (10 folds)
# predict
# y_pred = cross_val_predict(sub_svc, X_test, y_test,
#                            groups=y_test, cv=cv)
# # scores


# for i in range(0, len(cv_acc)):
# #         s_data.append(cv_acc[i])
#     sub_svc.fit(X_train, y_train)
#     y_pred = sub_svc.predict(X_test)
#     # evaluate overall model performance on training data
#     overall_acc = accuracy_score(y_pred = y_pred,
#                                  y_true = y_test)
#     overall_cr = classification_report(y_pred = y_pred,
#                                        y_true = y_test)
#     acc = sub_svc.score(X_test, y_test)
#     cr = classification_report(y_pred=y_pred, y_true=y_test) # get prec., recall & f1
# #         s_data.append(acc)

#     # get coefficients
#     coef_ = sub_svc.coef_[0]

#     print('\n\n'.join([f'Accuracy: {acc}',
#                        f'Classification Report: {cr}',
#                        f'Overall Accuracy Score: {overall_acc}',
#                        f'Overall Classification Report: {overall_cr}',
#                        f'Number of Coefficients: {coef_.shape}',
#                        f'Coefficients: {coef_}'])) 
# display(y_pred, cv_acc)

In [None]:
from nilearn.input_data import NiftiSpheresMasker
from sklearn.model_selection import cross_val_predict, train_test_split
from sklearn.model_selection import cross_val_score, KFold
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC, LinearSVC

difumo_cut_coords = pd.read_csv('/data/simexp/fnadeau/difumo_64_dims_3mm_cut_coords.tsv',
                                sep='\t', index_col='component')

spheres_masker = NiftiSpheresMasker(seeds=difumo_cut_coords[['x', 'y', 'z']].values,
                                    standardize='zscore').fit()
# mask_img=spheres_masker
# help(spheres_masker.fit)
signals = spheres_masker.transform_single_imgs(beta_list_test)


spacetime_sigs = pd.DataFrame(signals,
                              columns=difumo_cut_coords.difumo_names,
                              index=trials)

X_train, X_test, y_train, y_test = train_test_split(
    signals,
    sess00.events.trial_type.values,
    test_size = 0.3,
    shuffle = True,
    stratify = trials)
sub_svc = SVC(kernel='linear',
              class_weight='balanced',
#               C=0.1,
#               degree=1,
#               coef0=1,
              tol=0.05,
              probability=True,
              decision_function_shape='ovr',
#               break_ties=True,
              gamma='auto',
#               random_state=1,
              cache_size=600)
sub_svc.fit(X_train, y_train)
print(sub_svc.score(X_test, y_test))
# logreg = LogisticRegression(
#                             class_weight='balanced',
#                             max_iter=1000000,
# #                             tol=0.05,
#                             multi_class='ovr',
# #                             C=0.1,
# #                             solver='liblinear',
# #                             penalty='l2',
# #                             solver='liblinear'
#                            ).fit(X_train, y_train)
# logreg.score(X_test, y_test)
# logreg = LogisticRegression(class_weight='balanced').fit(list(itm[1] for itm in spacetime_sigs.iterrows()),
#                                                          spacetime_sigs.index)


# display(logreg.predict_log_proba(X_test),
#         logreg.predict_proba(X_test), logreg.score(X_test, y_test))

In [None]:
# import numpy as np
# import pandas as pd
# from nibabel.nifti1 import Nifti1Image
# from nilearn.glm.first_level import FirstLevelModel
# from nilearn.glm.first_level import make_first_level_design_matrix
# from nilearn.input_data import NiftiSpheresMasker
# from operator import itemgetter
# from os import PathLike
# from pathlib import PosixPath
# from sklearn.utils import Bunch
# from typing import Iterable, Union

# def make_betamap(fmri_img: Union[str, PathLike, PosixPath,
#                                  Nifti1Image] = None,
#                  events: Union[str, PathLike, PosixPath,
#                                pd.DataFrame] = None,
#                  mask_img: Union[bool, str, PathLike, PosixPath,
#                                  Nifti1Image] = False,
#                  fwhm: Union[float, int] = None,
#                  design_kws: Union[dict, Bunch] = None,
#                  glm_kws: Union[dict, Bunch] = None,
#                  masker_kws: Union[dict, Bunch] = None,
#                  session: Union[dict, Bunch] = None,
#                  **kwargs):

#     def get_frame_times(fmri_img:Nifti1Image):
#         return (np.arange(fmri_img.shape[-1]) *
#                 fmri_img.header.get_zooms()[-1])

#     if session is not None:
#         fmri_img, events, design_defs, glm_defs = \
#             itemgetter(*['cleaned_fmri', 'events',
#                          'design_defs', 'glm_defs'])(session)
    
#     else:
#         fmri_img = nilearn.image.load_img(fmri_img)
#         if not isinstance(events, pd.DataFrame):
#             events = pd.read_csv(events, sep=get_separator(events))
#         else:
#             events = events
#     events = preprocess_events(events, fmri_img)[['onset', 'duration',
#                                                   'condition']]
#     events = events.rename({'condition':'trial_type'}, axis=1)
#     if fwhm is not None:
#         fmri_img = nilearn.image.smooth_img(fmri_img, fwhm)
    
#     if design_kws is not None:
#         _params.design_defs.update(design_kws)
#     if masker_kws is not None:
#         _params.masker_defs.update(masker_kws)
#     if glm_kws is not None:
#         _params.glm_defs.update(glm_kws)

#     all_betas_filelist_A = []

#     for row in tqdm(list(events.iterrows()),
#                     desc='Computing Beta Map'):
#         trial_events = events.copy(deep = True)
#         conditions = ['X_'+trial_row[1].trial_type if
#                       trial_row[0]==row[0]
#                       else trial_row[1].trial_type
#                       for trial_row in trial_events.iterrows()]
#         trial_events['trial_type'] = conditions

#         design = [make_first_level_design_matrix(
#                       frame_times=get_frame_times(fmri_img),
#                       events=trial_events,
#                       **design_kws)
#                   if design_kws is not None
#                   else make_first_level_design_matrix(
#                       frame_times=get_frame_times(fmri_img),
#                       events=trial_events[['onset','duration','trial_type']])][0]

#         trial_model = [FirstLevelModel(**glm_kws).fit(
#                            fmri_img, design_matrices=design)
#                        if glm_kws is not None else
#                        FirstLevelModel().fit(
#                            fmri_img, design_matrices=design)][0]
#         contrast_vec = [int(trial.startswith('X_'))
#                         for trial in trial_events.trial_type.tolist()]

#         b_map = trial_model.compute_contrast(design.columns[row[0]],
#                                              output_type='effect_size')
#         all_betas_filelist_A.append(b_map)

#     return nilearn.image.concat_imgs(all_betas_filelist_A)

# betamap = make_betamap(session=sess00)
# # help(FirstLevelModel)
# #  | mask_img : Niimg-like, NiftiMasker object or False, optional
# #  |      Mask to be used on data. If an instance of masker is passed,
# #  |      then its mask will be used. If no mask is given,
# #  |      it will be computed automatically by a NiftiMasker with default
# #  |      parameters. If False is given then the data will not be masked.

In [None]:
def sub_tcontrasts1(session:Union[dict,Bunch]=None,
                    sub_id:str=None,
                    tr:float=None,
                    frame_times:list=None,
                    hrf_model:str=None,
                    events:pd.DataFrame=None,
                    fmri_img:Nifti1Image=None,
                    sub_outdir:Union[str,os.PathLike]=None,
                    glm_kws: Union[dict, Bunch] = None):
    """
    Create beta values maps using nilearn first-level model.

    The beta values correspond to the following contrasts between conditions:
    control, encoding, and encoding_minus_control

    Parameters:
    ----------
    sub_id: string (subject's dccsub_id)
    tr: float (length of time to repetition, in seconds)
    frames_times: list of float (onsets of fMRI frames, in seconds)
    hrf_model: string (type of HRF model)
    confounds: pandas dataframe (motion and other noise regressors)
    all_events: string (task information: trials' onset time, duration and label)
    fmrsub_idir: string (path to directory with fMRI data)
    outdir: string (path to subject's image output directory)

    Return:
    ----------
    None (beta maps are exported in sub_outdir)
    """

    if isinstance(session, dict):
        session = Bunch(**session)
    # Model 1: encoding vs control conditions
    events = preprocess_events(session.events, fmri_img)[['onset', 'duration',
                                                  'condition']]
    events = events.rename({'condition':'trial_type'}, axis=1)
#     events1 = session.events[['onset', 'duration',
#                               'trial_type']].copy(deep = True)
    if glm_kws is not None:
        session.glm_defs.update(glm_kws)
    all_betas_filelist_A = []

    for row in tqdm(list(events.iterrows()),
                    desc='Computing Beta Map'):
        trial_events = events.copy(deep = True)
        conditions = ['X_'+trial_row[1].trial_type if
                      trial_row[0]==row[0]
                      else trial_row[1].trial_type
                      for trial_row in trial_events.iterrows()]
        trial_events['trial_type'] = conditions     
    # create the model - Should data be standardized?
    model1 = FirstLevelModel(**session.glm_defs)

    # create the design matrices
    design1 = make_first_level_design_matrix(events=events1,
                                             frame_times=session.frame_times,
                                             **session.design_defs)

    # fit model with design matrix
    model1 = model1.fit(session.cleaned_fmri, design_matrices = design1)

    # Condition order: control, encoding (alphabetical)
    # contrast 1.1: control condition
    ctl_vec = np.repeat(0, design1.shape[1])
    ctl_vec[0] = 1
    b11_map = model1.compute_contrast(ctl_vec, output_type='effect_size') #"effect_size" for betas
    b11_name = f'betas_{session.sub_id}_ctl.nii'

    #contrast 1.2: encoding condition
    enc_vec = np.repeat(0, design1.shape[1])
    enc_vec[1] = 1
    b12_map = model1.compute_contrast(enc_vec, output_type='effect_size') #"effect_size" for betas
    b12_name = f'betas_{session.sub_id}_enc.nii'

    #contrast 1.3: encoding minus control
    encMinCtl_vec = np.repeat(0, design1.shape[1])
    encMinCtl_vec[1] = 1
    encMinCtl_vec[0] = -1
    b13_map = model1.compute_contrast(encMinCtl_vec, output_type='effect_size') #"effect_size" for betas
    b13_name = f'betas_{session.sub_id}_enc_minus_ctl.nii'
    contrasts = ((b11_map, b11_name), (b12_map, b12_name), (b13_map, b13_name))
    if sub_outdir is not None:
        savedir = os.path.join(sub_outdir, session.sub_id, session.ses_id)
        os.makedirs(savedir, exist_ok=True)
        [nibabel.save(*contrast) for contrast in contrasts]
    return contrasts

contrasts01 = sub_tcontrasts1(session=sess00, glm_kws={'standardize':True})

In [None]:
[itm[0].shape for itm in contrasts01]

In [None]:
# from nilearn.input_data import NiftiMapsMasker

# atlas_kws = dict(data_dir='/data/simexp/fnadeau/difumo_atlases/',
#                  resolution_mm=3, dimension=64)

# difumo=get_difumo(**atlas_kws)

# difumo_cut_coords = pd.read_csv('/data/simexp/fnadeau/difumo_64_dims_3mm_cut_coords.tsv',
#                                 sep='\t', index_col='component')


spheres_masker = NiftiSpheresMasker(seeds=difumo_cut_coords[['x', 'y', 'z']].values,
#                                     mask_img=maps_masker.mask_img,
                                    standardize=False,
                                    standardize_confounds=False).fit()

maps_masker = NiftiMapsMasker(maps_img=difumo.maps,
                              mask_img=spheres_masker.mask_img,
                              standardize_confounds=False,
                              resampling_target='data').fit()

maps_masker.__dict__.keys()
# help(NiftiMapsMasker)

**Version A: A separate model created for each trial.
Trial of interest modelled as its own condition, and other trials are modelled as two conditions: control, and encoding (excluding trial of interest)


In [None]:
# sorted(dir(inspect))
# help(inspect.getattr_static)
# attrgetter

# inspect.getattr_static(obj=nilearn.datasets,
#                        attr=f'fetch_atlas_{atlas_name}')
# atlas_name='difumo'
# atlas_caller = getattr(nilearn.datasets, f'fetch_atlas_{atlas_name}')
# atlas_caller(**atlas_kws)

In [None]:


get_difumo_cut_coords(64, 3, '/data/simexp/fnadeau/difumo_atlases/')

In [None]:
sess00.events

In [None]:
help(LogisticRegression)

In [None]:
help(sub_svc)

In [None]:
# SGDClassifier, LogisticRegressionCV
# X_train, X_test, y_train, y_test = train_test_split(
#     signals, trials, test_size = 0.4,
#     shuffle = True, stratify = trials)

# display(logreg.predict_log_proba(X_test), logreg.predict_proba(X_test),
#         logreg.predict_proba(X_test), logreg.score(X_test, y_test))
# help(logreg)

In [None]:
from sklearn.linear_model import LinearRegression, LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(
    session.computed_.signal_matrix, session.events.iloc[:, -1], test_size = 0.4,
    shuffle = True, stratify = session.events.iloc[:, -1])
linreg = LogisticRegression(max_iter=1000).fit(X_train, y_train)
sorted(linreg.__dict__.keys())
# sorted(dir(linreg))
# help(linreg.score)

# trial_pred = linreg.predict(X_test)
# linreg.score(linreg.predict(X_test), y_test)

In [None]:
spacetime_sigs = pd.DataFrame(X_enc_ctl,
                              columns=difumo_cut_coords.difumo_names,
                              index=sess00.events.trial_type)



In [None]:
y_enco_ctl = sess00.events.trial_type
spheres_masker.__dict__.keys()

In [None]:
# train_test_split ``stratify``: keep distribution of conditions consistent betw. train & test sets
# ``random_state`` = 123  # if set number, same shuffle each time, otherwise randomization algo
def classify_trial(betamap, masker, y, **kwargs):

    # transform subject's beta maps into vector of network means per trial
    X = masker.transform_single_imgs(betamap)

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size = 0.4,
        shuffle = True, stratify = y)
    
    # Model
    sub_svc = SVC(kernel='linear', class_weight='balanced')
#     sub_svc.fit(X_train, y_train)

    # Cross-validation (10 folds)
    y_pred = cross_val_predict(sub_svc, X_train, y_train,
                               groups=y_train, cv=cv)
    # scores
    cv_acc = cross_val_score(sub_svc, X_train, y_train,
                             groups=y_train, cv=cv)
    summary_str = '\n\n'.join(['Training Set', f'Length: {len(X_train)}',
                               f'Labels:\n{y_train.value_counts()}',
                               'Testing Set', f'Length: {len(X_test)}',
                               f'Labels:\n{y_test.value_counts()}',
                               f'Cross-Validation Accuracy Score:\n{cv_acc}'])
    for i in range(0, len(cv_acc)):
#         s_data.append(cv_acc[i])
        sub_svc.fit(X_train, y_train)
        y_pred = sub_svc.predict(X_test)
        # evaluate overall model performance on training data
        overall_acc = accuracy_score(y_pred = y_pred,
                                     y_true = y_test)
        overall_cr = classification_report(y_pred = y_pred,
                                           y_true = y_test)
        acc = sub_svc.score(X_test, y_test)
        cr = classification_report(y_pred=y_pred, y_true=y_test) # get prec., recall & f1
#         s_data.append(acc)

        # get coefficients
        coef_, sub_basc = sub_svc.coef_[0], basc_labels.copy()
        sub_basc.insert(loc=3, column='coef', value=coef_,
                        allow_duplicates=True)

        print('\n\n'.join([f'Accuracy: {acc}',
                           f'Classification Report: {cr}',
                           f'Overall Accuracy Score: {overall_acc}',
                           f'Overall Classification Report: {overall_cr}',
                           f'Number of Coefficients: {coef_.shape}',
                           f'Coefficients: {coef_}']))       
    [s_data.append(sub_basc['coef'][j]) for j in range(1, len(sub_basc['coef']))]

    sub_basc.sort_values(by='coef', axis = 0, ascending = False, inplace=True)
    #print(sub_basc.iloc[0:12, 2:4])

    enc_ctl_data = enc_ctl_data.append(pd.Series(s_data, index=enc_ctl_data.columns),
                                       ignore_index=True)
    print(enc_ctl_data)

classify_trial(betamap=betamap,
               masker=spheres_masker,
               y=sess00.events.trial_type)

In [None]:
import nilearn
import os
import pandas as pd
from io import StringIO
from pathlib import Path
from sklearn.utils import Bunch
from typing import Union
from sklearn.model_selection import cross_val_predict, train_test_split
from sklearn.model_selection import cross_val_score, KFold
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC
# from get_difumo import get_difumo

# difumo=get_difumo(data_dir='/data/simexp/fnadeau/difumo_atlases/',
#                   resolution_mm=3, dimension=64)

cv, basc_labels, s_data = 10, difumo.labels, []
# build data structure to store accuracy data and coefficients
enc_ctl_data = pd.DataFrame()
enc_ctl_data.insert(loc = 0, column = 'pscid', value = 'None',
                    allow_duplicates=True)
for i in range(0, cv):
    enc_ctl_data.insert(loc = enc_ctl_data.shape[1],
                        column = 'CV'+str(i+1)+'_acc',
                        value = NaN, allow_duplicates=True)
    enc_ctl_data.insert(loc = enc_ctl_data.shape[1],
                        column = 'TrainSet_MeanCV_acc',
                        value = 'None', allow_duplicates=True)
    enc_ctl_data.insert(loc = enc_ctl_data.shape[1],
                        column = 'TestSet_acc',
                        value = 'None', allow_duplicates=True)
    roi_names = difumo_cut_coords.difumo_names.values


for i in range(1, difumo.labels.shape[0]):
    enc_ctl_data.insert(loc = enc_ctl_data.shape[1],
                        column = roi_names[i]+'_coef',
                        value = NaN, allow_duplicates=True)

    betas, sub_mask = betamap, sess00.mask_img

    # transform subject's beta maps into vector of network means per trial
    X_enc_ctl = spheres_masker.transform_single_imgs(betas)

    # load subject's trial labels
    y_enco_ctl = sess00.events.trial_type
    # Split trials into a training and a test set
    X_train, X_test, y_train, y_test = train_test_split(
        X_enc_ctl, # x
        y_enco_ctl, # y
        test_size = 0.4, # 60%/40% split
        shuffle = True, # shuffle dataset before splitting
        stratify = y_enco_ctl, # keep distribution of conditions consistent betw. train & test sets
        #random_state = 123  # if set number, same shuffle each time, otherwise randomization algo
        )
    # define the model
    sub_svc = SVC(kernel='linear', class_weight='balanced')

    # Cross-validation (10 folds)
    # predict
    y_pred = cross_val_predict(sub_svc, X_train, y_train,
                               groups=y_train, cv=cv)
    # scores
    cv_acc = cross_val_score(sub_svc, X_train, y_train,
                             groups=y_train, cv=cv)
    print('\n\n'.join(['Training Set',
                       f'Length: {len(X_train)}',
                       f'Labels:\n{y_train.value_counts()}',
                       'Testing Set',
                       f'Length: {len(X_test)}',
                       f'Labels: {y_test.value_counts()}',
                       f'Cross-Validation Accuracy Score: {cv_acc}']))


In [None]:
# ENCODING VERSUS CONTROL TASK CLASSIFICATION

# from sklearn.model_selection import cross_val_predict, train_test_split
# from sklearn.model_selection import cross_val_score, KFold
# from sklearn.metrics import accuracy_score, classification_report
# from sklearn.model_selection import GridSearchCV
# from sklearn.metrics import classification_report
# from sklearn.svm import SVC

# for i in range(1, difumo.labels.shape[0]):
#     enc_ctl_data.insert(loc = enc_ctl_data.shape[1],
#                         column = roi_names[i]+'_coef',
#                         value = NaN, allow_duplicates=True)

#     betas, sub_mask = betamap, sess00.mask_img

#     # transform subject's beta maps into vector of network means per trial
#     X_enc_ctl = spheres_masker.transform_single_imgs(betas)

#     # load subject's trial labels
#     y_enco_ctl = sess00.events.trial_type
#     # Split trials into a training and a test set
#     X_train, X_test, y_train, y_test = train_test_split(
#         X_enc_ctl, # x
#         y_enco_ctl, # y
#         test_size = 0.4, # 60%/40% split
#         shuffle = True, # shuffle dataset before splitting
#         stratify = y_enco_ctl, # keep distribution of conditions consistent betw. train & test sets
#         #random_state = 123  # if set number, same shuffle each time, otherwise randomization algo
#         )
#     # define the model
#     sub_svc = SVC(kernel='linear', class_weight='balanced')

#     # Cross-validation (10 folds)
#     # predict
#     y_pred = cross_val_predict(sub_svc, X_train, y_train,
#                                groups=y_train, cv=cv)
#     # scores
#     cv_acc = cross_val_score(sub_svc, X_train, y_train,
#                              groups=y_train, cv=cv)
#     print('\n\n'.join(['Training Set',
#                        f'Length: {len(X_train)}',
#                        f'Labels:\n{y_train.value_counts()}',
#                        'Testing Set',
#                        f'Length: {len(X_test)}',
#                        f'Labels: {y_test.value_counts()}',
#                        f'Cross-Validation Accuracy Score: {cv_acc}']))    

    for i in range(0, len(cv_acc)):
        s_data.append(cv_acc[i])
        sub_svc.fit(X_train, y_train)
        # evaluate overall model performance on training data
        overall_acc = accuracy_score(y_pred = sub_svc.predict(X_test),
                                     y_true = y_test)
        overall_cr = classification_report(y_pred = sub_svc.predict(X_test),
                                           y_true = y_test)

        s_data.append(overall_acc)
        # classify age class using testing data        
        y_pred = sub_svc.predict(X_test)
        acc = sub_svc.score(X_test, y_test) # get accuracy
        cr = classification_report(y_pred=y_pred, y_true=y_test) # get prec., recall & f1
        s_data.append(acc)

        # get coefficients
        coef_, sub_basc = sub_svc.coef_[0], basc_labels.copy()
        sub_basc.insert(loc=3, column='coef', value=coef_,
                        allow_duplicates=True)

        print('\n\n'.join([f'Accuracy: {acc}',
                           f'Classification Report: {cr}',
                           f'Overall Accuracy Score: {overall_acc}',
                           f'Overall Classification Report: {overall_cr}',
                           f'Number of Coefficients: {coef_.shape}',
                           f'Coefficients: {coef_}']))        
    [s_data.append(sub_basc['coef'][j]) for j in range(1, len(sub_basc['coef']))]

    sub_basc.sort_values(by='coef', axis = 0, ascending = False, inplace=True)
    #print(sub_basc.iloc[0:12, 2:4])

    enc_ctl_data = enc_ctl_data.append(pd.Series(s_data, index=enc_ctl_data.columns),
                                       ignore_index=True)
    print(enc_ctl_data)

In [None]:
# ENCODING VERSUS CONTROL TASK CLASSIFICATION

# s_data = []

# for i in range(1, difumo.labels.shape[0]):
#     enc_ctl_data.insert(loc = enc_ctl_data.shape[1],
#                         column = roi_names[i]+'_coef',
#                         value = NaN, allow_duplicates=True)

#     betas, sub_mask = alltrials_betas_A, sess00.mask_img
#     # initialize NiftiLabelMasker object    
#     sub_label_masker = NiftiLabelsMasker(labels_img=basc,
#                                          standardize=True,
#                                          mask_img=sub_mask,
#                                          verbose=0)

#     # transform subject's beta maps into vector of network means per trial
#     X_enc_ctl = sub_label_masker.fit_transform(betas)

#     # load subject's trial labels
#     y_enco_ctl = sess00.events.trial_type
#     # Split trials into a training and a test set
#     X_train, X_test, y_train, y_test = train_test_split(
#         X_enc_ctl, # x
#         y_enco_ctl, # y
#         test_size = 0.4, # 60%/40% split
#         shuffle = True, # shuffle dataset before splitting
#         stratify = y_enco_ctl, # keep distribution of conditions consistent betw. train & test sets
#         #random_state = 123  # if set number, same shuffle each time, otherwise randomization algo
#         ) 
#     print('\n\n'.join(['Training Set',
#                        f'Length: {len(X_train)}',
#                        f'Labels: {y_train.value_counts()}',
#                        'Testing Set',
#                        f'Length: {len(X_test)}',
#                        f'Labels: {y_test.value_counts()}']))

#     # define the model
#     sub_svc = SVC(kernel='linear', class_weight='balanced')

#     # Cross-validation to evaluate model performance
#     # within 10 folds of training set
#     # predict
#     y_pred = cross_val_predict(sub_svc, X_train, y_train,
#                                groups=y_train, cv=cv)
#     # scores
#     cv_acc = cross_val_score(sub_svc, X_train, y_train,
#                              groups=y_train, cv=cv)
#     print(f'Cross-Validation Accuracy Score: {cv_acc}')

#     for i in range(0, len(cv_acc)):
#         s_data.append(cv_acc[i])

#         # evaluate overall model performance on training data
#         overall_acc = accuracy_score(y_pred = y_pred, y_true = y_train)
#         overall_cr = classification_report(y_pred = y_pred, y_true = y_train)

#     #     s_data.append(overall_acc)
#         # Test model on unseen data from the test set
#         sub_svc.fit(X_train, y_train)
#         y_pred = sub_svc.predict(X_test) # classify age class using testing data
#         acc = sub_svc.score(X_test, y_test) # get accuracy
#         cr = classification_report(y_pred=y_pred, y_true=y_test) # get prec., recall & f1
#         s_data.append(acc)

#         # get coefficients
#         coef_ = sub_svc.coef_[0]
#         print('\n\n'.join([f'Accuracy: {acc}',
#                            f'Classification Report: {cr}',
#                            f'Overall Accuracy Score: {overall_acc}',
#                            f'Overall Classification Report: {overall_cr}',
#                            f'Number of Coefficients: {coef_.shape}',
#                            f'Coefficients: {coef_}']))
#         sub_basc = basc_labels.copy()
#         sub_basc.insert(loc=3, column='coef', value=coef_,
#                         allow_duplicates=True)

#     coef = sub_basc['coef']
#     for j in range(0, len(coef)):
#         s_data.append(coef[j])

#     sub_basc.sort_values(by='coef', axis = 0, ascending = False, inplace=True)
#     #print(sub_basc.iloc[0:12, 2:4])

#     enc_ctl_data = enc_ctl_data.append(pd.Series(s_data, index=enc_ctl_data.columns),
#                                        ignore_index=True)


demo_data = sub_data.copy()
demo_data.reset_index(level=None, drop=False, inplace=True)

enc_ctl_data.insert(loc = 1, column = 'cognitive_status',
                    value = demo_data['cognitive_status'], allow_duplicates=True)
enc_ctl_data.insert(loc = 2, column = 'total_scrubbed_frames',
                    value = demo_data['total_scrubbed_frames'], allow_duplicates=True)
enc_ctl_data.insert(loc = 3, column = 'mean_FD',
                    value = demo_data['mean_FD'], allow_duplicates=True)
enc_ctl_data.insert(loc = 4, column = 'hits', value = demo_data['hits'], allow_duplicates=True)
enc_ctl_data.insert(loc = 5, column = 'miss', value = demo_data['miss'], allow_duplicates=True)
enc_ctl_data.insert(loc = 6, column = 'correct_source',
                    value = demo_data['correct_source'], allow_duplicates=True)
enc_ctl_data.insert(loc = 7, column = 'wrong_source',
                    value = demo_data['wrong_source'], allow_duplicates=True)
enc_ctl_data.insert(loc = 8, column = 'dprime',
                    value = demo_data['dprime'], allow_duplicates=True)
enc_ctl_data.insert(loc = 9, column = 'associative_memScore',
                    value = demo_data['associative_memScore'], allow_duplicates=True)    

# enc_ctl_data.to_csv(os.path.join(output_dir, 'SVC_withinSub_enc_ctl_'+str(numnet)+'networks.tsv'),
#     sep='\t', header=True, index=False)


In [None]:
# view_img(mean_img(alltrials_betas_A),bg_img=anat0, threshold='auto')
alltrials_betas_A.shape

**Version B: A separate model created for each trial.
Trial of interest modelled as its own condition, and other trials (encoding and control, excluding trial of interest) are modelled as a single condition

In [None]:
outBeta_dir_B = '/Users/mombot/Documents/Simexp/CIMAQ/Data/Nistats/Betas/122922/OneModelPerTrial_B'
all_betas_filelist_B = []
####

#Create a design matrix, first level model and beta map for each encoding and control trial 
for i in range (0, numTrials):

    #copy all_events dataframe to keep the original intact
    events = all_events.copy(deep = True)

    #Determine trial number and condition (encoding or control)
    tnum = events.iloc[i, 6]
    currentCondi = events.iloc[i, 3]
    tname = events.iloc[i, 2]
        
    #Version A: (2 conditions modelled separately)
    #modify trial_type column to model only the trial of interest 
    for j in events.index:
        if events.loc[j, 'trial_number'] != tnum:
            events.loc[j, 'trial_type']= 'X_otherCondi'
            #X for condition to remain in alphabetical order: trial of interest, X_CTL, X_Enc
    #verify: what determines the order of columns in design matrix?    

    #remove unecessary columns    
    cols = ['onset', 'duration', 'trial_type']
    events = events[cols]
    
    #create the model
    s_model = FirstLevelModel(t_r=tr, drift_model = None, standardize = True, noise_model='ar1',
                               hrf_model = hrf_model)    
    #Should data be standardized?

    #create the design matrices
    design = make_first_level_design_matrix(frame_times,
                                            events=events,
                                            drift_model=None,
#                                             add_regs=confounds, 
                                            hrf_model=hrf_model)
    
    #fit model with design matrix
    s_model = s_model.fit(fmri_img, design_matrices = design)
    
    design_matrix = s_model.design_matrices_[0]
    
    #sanity check: print design matrices and corresponding parameter labels
    #plot outputed design matrix for visualization
    print(str(tnum), ' ', tname, ' ', design_matrix.columns[0:3])
    plot_design_matrix(design_matrix)
    plt.show()

    #Contrast vector: 1 in design matrix column that corresponds to trial of interest, 0s elsewhere
    contrast_vec = np.repeat(0, design_matrix.shape[1])
    contrast_vec[0] = 1

    #compute the contrast's beta maps with the model.compute_contrast() method,
    #based on contrast provided. 
    #https://nistats.github.io/modules/generated/nistats.first_level_model.FirstLevelModel.html
    b_map = s_model.compute_contrast(contrast_vec, output_type='effect_size') #"effect_size" for betas
    b_name = os.path.join(outBeta_dir_B, 'betas_sub'+str(id)+'_Trial'+str(tnum)+'_'+tname+'.nii')
    #export b_map .nii image in output directory
    nibabel.save(b_map, b_name)
    print(os.path.basename(b_name))
    all_betas_filelist_B.append(b_name)
    
alltrials_betas_B = nibabel.funcs.concat_images(images=all_betas_filelist_B, check_affines=True, axis=None)
print(alltrials_betas_B.shape)
nibabel.save(alltrials_betas_B, os.path.join(outBeta_dir_B, 'concat_all_betas_sub'+str(id)+'.nii'))



In [None]:
outBeta_dir = '/Users/mombot/Documents/Simexp/CIMAQ/Data/Nistats/Betas/122922/Conditions_Contrasts'

#Model 1: encoding vs control conditions
events1 = all_events.copy(deep = True)
cols = ['onset', 'duration', 'condition']
events1 = events1[cols]
events1.rename(columns={'condition':'trial_type'}, inplace=True)

print(events1.head())

#create the model
model1 = FirstLevelModel(t_r=tr, drift_model = None, standardize = True, noise_model='ar1',
                         hrf_model = hrf_model)    
#Should data be standardized?

#create the design matrices
design1 = make_first_level_design_matrix(frame_times, events=events1,
                                        drift_model=None, add_regs=confounds, 
                                        hrf_model=hrf_model)

#fit model with design matrix
model1 = model1.fit(fmri_img, design_matrices = design1)    

design_matrix1 = model1.design_matrices_[0]    
plot_design_matrix(design_matrix1)
plt.show()
print(design_matrix1.columns[0:5])

#Condition order: control, encoding (alphabetical)

#contrast 1.1: control condition
ctl_vec = np.repeat(0, design_matrix1.shape[1])
ctl_vec[0] = 1
b11_map = model1.compute_contrast(ctl_vec, output_type='effect_size') #"effect_size" for betas
b11_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_ctl.nii')
nibabel.save(b11_map, b11_name)
print(os.path.basename(b11_name))
print(ctl_vec)

#contrast 1.2: encoding condition
enc_vec = np.repeat(0, design_matrix1.shape[1])
enc_vec[1] = 1
b12_map = model1.compute_contrast(enc_vec, output_type='effect_size') #"effect_size" for betas
b12_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_enc.nii')
nibabel.save(b12_map, b12_name)
print(os.path.basename(b12_name))
print(enc_vec)

#contrast 1.3: encoding minus control 
encMinCtl_vec = np.repeat(0, design_matrix1.shape[1])
encMinCtl_vec[1] = 1
encMinCtl_vec[0] = -1
b13_map = model1.compute_contrast(encMinCtl_vec, output_type='effect_size') #"effect_size" for betas
b13_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_enc_minus_ctl.nii')
nibabel.save(b13_map, b13_name)
print(os.path.basename(b13_name))
print(encMinCtl_vec)


In [None]:
#Model 2: missed vs hit encoding trials
events2 = all_events.copy(deep = True)
cols2 = ['onset', 'duration', 'ctl_miss_hit']
events2 = events2[cols2]
events2.rename(columns={'ctl_miss_hit':'trial_type'}, inplace=True)

print(events2.iloc[0:15, :])

#create the model
model2 = FirstLevelModel(t_r=tr,
                         drift_model = None,
                         standardize = True,
                         noise_model='ar1',
                         hrf_model = hrf_model)    
#Should data be standardized?

#create the design matrices
design2 = make_first_level_design_matrix(frame_times, events=events2,
                                         drift_model=None,
                                         add_regs=confounds, 
                                         hrf_model=hrf_model)

#fit model with design matrix
model2 = model2.fit(fmri_img, design_matrices = design2)    

design_matrix2 = model2.design_matrices_[0]    
plot_design_matrix(design_matrix2)
plt.show()
print(design_matrix2.columns[0:5])

##Condition order: control, hit, missed (alphabetical)

#contrast 2.1: miss 
miss_vec = np.repeat(0, design_matrix2.shape[1])
miss_vec[2] = 1
b21_map = model2.compute_contrast(miss_vec, output_type='effect_size') #"effect_size" for betas
b21_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_miss.nii')
nibabel.save(b21_map, b21_name)
print(os.path.basename(b21_name))
print(miss_vec)

#contrast 2.2: hit 
hit_vec = np.repeat(0, design_matrix2.shape[1])
hit_vec[1] = 1
b22_map = model2.compute_contrast(hit_vec, output_type='effect_size') #"effect_size" for betas
b22_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_hit.nii')
nibabel.save(b22_map, b22_name)
print(os.path.basename(b22_name))
print(hit_vec)

#contrast 2.3: hit minus miss
hit_min_miss_vec = np.repeat(0, design_matrix2.shape[1])
hit_min_miss_vec[1] = 1
hit_min_miss_vec[2] = -1
b23_map = model2.compute_contrast(hit_min_miss_vec, output_type='effect_size') #"effect_size" for betas
b23_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_hit_minus_miss.nii')
nibabel.save(b23_map, b23_name)
print(os.path.basename(b23_name))
print(hit_min_miss_vec)

#contrast 2.4: hit minus control
hit_min_ctl_vec = np.repeat(0, design_matrix2.shape[1])
hit_min_ctl_vec[1] = 1
hit_min_ctl_vec[0] = -1
b24_map = model2.compute_contrast(hit_min_ctl_vec, output_type='effect_size') #"effect_size" for betas
b24_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_hit_minus_ctl.nii')
nibabel.save(b24_map, b24_name)
print(os.path.basename(b24_name))
print(hit_min_ctl_vec)

#contrast 2.5: miss minus control 
miss_min_ctl_vec = np.repeat(0, design_matrix2.shape[1])
miss_min_ctl_vec[2] = 1
miss_min_ctl_vec[0] = -1
b25_map = model2.compute_contrast(miss_min_ctl_vec, output_type='effect_size') #"effect_size" for betas
b25_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_miss_minus_ctl.nii')
nibabel.save(b25_map, b25_name)
print(os.path.basename(b25_name))
print(miss_min_ctl_vec)


In [None]:
#Model 3: correct source vs wrong source encoding trials
events3 = all_events.copy(deep = True)
cols3 = ['onset', 'duration', 'ctl_miss_ws_cs']
events3 = events3[cols3]
events3.rename(columns={'ctl_miss_ws_cs':'trial_type'}, inplace=True)

print(events3.iloc[0:15, :])

#create the model
model3 = FirstLevelModel(t_r=tr,
                         drift_model=None,
                         standardize=True,
                         noise_model='ar1',
                         hrf_model=hrf_model)    
#Should data be standardized?

#create the design matrices
design3 = make_first_level_design_matrix(frame_times, events=events3,
                                        drift_model=None,
                                         add_regs=confounds, 
                                        hrf_model=hrf_model)

#fit model with design matrix
model3 = model3.fit(fmri_img, design_matrices = design3)    

design_matrix3 = model3.design_matrices_[0]    
plot_design_matrix(design_matrix3)
plt.show()
print(design_matrix3.columns[0:5])

##Condition order: control, correct source, missed, wrong source (alphabetical)

#contrast 3.1: wrong source 
ws_vec = np.repeat(0, design_matrix3.shape[1])
ws_vec[3] = 1
b31_map = model3.compute_contrast(ws_vec, output_type='effect_size') #"effect_size" for betas
b31_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_ws.nii')
nibabel.save(b31_map, b31_name)
print(os.path.basename(b31_name))
print(ws_vec)

#contrast 3.2: correct source
cs_vec = np.repeat(0, design_matrix3.shape[1])
cs_vec[1] = 1
b32_map = model3.compute_contrast(cs_vec, output_type='effect_size') #"effect_size" for betas
b32_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_cs.nii')
nibabel.save(b32_map, b32_name)
print(os.path.basename(b32_name))
print(cs_vec)

#contrast 3.3: correct source minus wrong source
cs_minus_ws_vec = np.repeat(0, design_matrix3.shape[1])
cs_minus_ws_vec[1] = 1
cs_minus_ws_vec[3] = -1
b33_map = model3.compute_contrast(cs_minus_ws_vec, output_type='effect_size') #"effect_size" for betas
b33_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_cs_minus_ws.nii')
nibabel.save(b33_map, b33_name)
print(os.path.basename(b33_name))
print(cs_minus_ws_vec)

#contrast 3.4: correct source minus miss
cs_minus_miss_vec = np.repeat(0, design_matrix3.shape[1])
cs_minus_miss_vec[1] = 1
cs_minus_miss_vec[2] = -1
b34_map = model3.compute_contrast(cs_minus_miss_vec, output_type='effect_size') #"effect_size" for betas
b34_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_cs_minus_miss.nii')
nibabel.save(b34_map, b34_name)
print(os.path.basename(b34_name))
print(cs_minus_miss_vec)

#contrast 3.5: wrong source minus miss
ws_minus_miss_vec = np.repeat(0, design_matrix3.shape[1])
ws_minus_miss_vec[3] = 1
ws_minus_miss_vec[2] = -1
b35_map = model3.compute_contrast(ws_minus_miss_vec, output_type='effect_size') #"effect_size" for betas
b35_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_ws_minus_miss.nii')
nibabel.save(b35_map, b35_name)
print(os.path.basename(b35_name))
print(ws_minus_miss_vec)

#contrast 3.6: correct source minus control
cs_minus_ctl_vec = np.repeat(0, design_matrix3.shape[1])
cs_minus_ctl_vec[1] = 1
cs_minus_ctl_vec[0] = -1
b36_map = model3.compute_contrast(cs_minus_ctl_vec, output_type='effect_size') #"effect_size" for betas
b36_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_cs_minus_ctl.nii')
nibabel.save(b36_map, b36_name)
print(os.path.basename(b36_name))
print(cs_minus_ctl_vec)

#contrast 3.7: wrong source minus control
ws_minus_ctl_vec = np.repeat(0, design_matrix3.shape[1])
ws_minus_ctl_vec[3] = 1
ws_minus_ctl_vec[0] = -1
b37_map = model3.compute_contrast(ws_minus_ctl_vec, output_type='effect_size') #"effect_size" for betas
b37_name = os.path.join(outBeta_dir, 'betas_sub'+str(id)+'_ws_minus_ctl.nii')
nibabel.save(b37_map, b37_name)
print(os.path.basename(b37_name))
print(ws_minus_ctl_vec)


In [None]:
#Visualize beta maps

#plotting brain images in nilearn:
#http://nilearn.github.io/plotting/index.html

#define directory where subject's functional mask and anatomical scan reside
anat_dir = '/Users/mombot/Documents/Simexp/CIMAQ/Data/anat/122922'
#subject's anatomical scan
anat = nibabel.load(os.path.join(anat_dir, 'anat_sub122922_nuc_stereonl.nii'))
plot_anat(anat)

beta_list = glob.glob(os.path.join(outBeta_dir, '*.nii'))

for beta in beta_list:
    print(beta)
    plot_img(beta)
    plot_stat_map(stat_map_img=beta, bg_img=anat, cut_coords=(0, 0, 0), threshold=0.001, colorbar=True)
    show()
