Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #3

Open
LeonieBorne opened this issue Jun 15, 2020 · 4 comments
Open

Roadmap #3

LeonieBorne opened this issue Jun 15, 2020 · 4 comments

Comments

@LeonieBorne
Copy link
Owner

LeonieBorne commented Jun 15, 2020

Roadmap

This issue contains the roadmap of this project. It's a place to start to investigate the issues that you can contribute to.

Please note that the list of tutorials proposed are by no means exhaustive. If you wish to add/modify some of them, do not hesitate to suggest it by creating a new issue!

General

Here is a (non-exhaustive) list of points to be dealt with before/during/after the tutorials have been written.

Tutorial 0. Introduction #5

The objective of this introductory tutorial is to explain the general principles of cross-decomposition algorithms, their possible applications and practical considerations. It should introduce and refer to the other tutorials.

This tutorial should also give an overview of the different cross-decomposition algorithms that exist, including CCA, PLS regression, PLS canonical, PLS-PM (for more than 2-blocks of variables), etc.

Useful references

  • Cross-decomposition in scikit-learn: scikit-learn documentation for the cross-decomposition module (CCA, PLS regression, PLS canonical). Note that the documentation should be updated soon (see current pull request, corresponding branch).
  • PLS-PM: "PLS Path Modeling with R" Gaston Sanchez
  • PLS-PM in Python
  • PLS methods for neuroimaging: Krishnan, Anjali, et al. "Partial Least Squares (PLS) methods for neuroimaging: a tutorial and review." Neuroimage 56.2 (2011): 455-475.
  • CCA for neuroscientists: Wang, Hao-Ting, et al. "Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists." NeuroImage (2020)

Tutorial 1. Data preprocessing #6

This tutorial focus on minimal data preprocessing, usually required as for most machine-learning methods, with among other things:

  • z-scoring of each variable,
  • outlier detection,
  • missing values processing,
  • deconfounding procedures.

Useful references

  • Section 5.1 CCA for neuroscientists: Wang, Hao-Ting, et al. "Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists." NeuroImage (2020)

Tutorial 2. Data reduction #7

This tutorial focus on dimensionality-reduction techniques (PCA, ICA, etc.) that can provide useful data preprocessing when the number of variables exceeds the number of samples.

Useful references

  • Section 5.2 CCA for neuroscientists: Wang, Hao-Ting, et al. "Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists." NeuroImage (2020)

Tutorial 3. Model selection #8

This tutorial introduce to the different techniques used to evaluate/validate/select the model.

  • How to choose the optimal number of latent sources of variation to be extracted?
  • How to evaluate the contribution of each individual input variable to the overall modeling solution?
  • How to compare the models?

Useful references

  • Section 5.3 CCA for neuroscientists: Wang, Hao-Ting, et al. "Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists." NeuroImage (2020)
  • Section 4.6 PLS-PM: "PLS Path Modeling with R" Gaston Sanchez
  • Comparison CCA/PLS: Rahim, Mehdi, Bertrand Thirion, and Gaël Varoquaux. "Multi-output predictions from neuroimaging: assessing reduced-rank linear models." 2017 International Workshop on Pattern Recognition in Neuroimaging (PRNI). IEEE, 2017.
  • Permutation inference for CCA: Winkler, Anderson M., et al. "Permutation inference for Canonical Correlation Analysis." arXiv preprint arXiv:2002.10046 (2020).
@htwangtw
Copy link
Collaborator

I will add one more paper to tutorial 3:
Permutation Inference for Canonical Correlation Analysis
https://arxiv.org/abs/2002.10046
I would like to implement this method - they have matlab code and pseudocode
However, the implementation could be a project of its own.

@fBeyer89
Copy link

fBeyer89 commented Jun 16, 2020

I would suggest this tutorial as reference for PLS (Partial Least Squares (PLS) methods for neuroimaging: A tutorial and review) It's quite old but I found it really helpful when I was implementing PLS.

@htwangtw
Copy link
Collaborator

Let's welcome @fBeyer89 @bbuckova @diiobo @nadinespy from EMEA to the project!

@LeonieBorne
Copy link
Owner Author

@htwangtw @fBeyer89 Thank you so much for the references! I have added them to Tutorial 3 and 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants