-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Initial implementation for learning curve creation #132
Conversation
CI does not pass due to #133. |
@jsosulski Pinning sklearn would probably be the right call. (edit: I did so in 009d6fa) @alexandrebarachant was active in some muse-lsl PR recently, but no word on when my fix pyRiemann/pyRiemann#93 will be merged. |
This can be used in custom evaluations
The initial implementation works so far @sylvchev . Notable change from our discussion: The number of permutations The current example script runs for 1.5 minutes (compare to the |
I pruned some more run time of the example script and made the code to conform to flake8. @sylvchev we talked about an option to set dataset sizes using, e.g., the absolute number of samples per class. For example:
Should these constraints also be applied to the test_data? E.g. lets say I have a class imbalance of 1Target:5Non-Target, should the classifier train on 1:1 but then be evaluated on 1:5? |
elapsed_time = time.time() - start_time | ||
print(f"Elapsed time: {elapsed_time/60} minutes.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use time on the pushed version, it is automatically added by Sphinx. You could see it on the bottom of the documentation pages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, thanks! Btw, how is the documentation generated? I do not see anything in github actions that looks like documentation generation? If I know the command I could build them on my machine for testing.
import time | ||
|
||
start_time = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could revert those changes, it will automatically indicated by Sphinx
:param n_perms: Number of permutations to perform. If an array | ||
is passed it has to be equal in size to the data_size array. | ||
:param data_size: Contains the policy to pick the datasizes to | ||
evaluate, as well as the actual values. The dict has the | ||
key 'policy' with either 'ratio' or 'per_class', and the key | ||
'value' with the actual values as an numpy array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the formatting of MNE/Scikit for the docstring, here is an example : https://github.com/NeuroTechX/moabb/blob/master/moabb/evaluations/base.py#L92
def __init__(self, n_perms: Union[int, np.ndarray] = 20, | ||
data_size: Optional[dict] = None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice to use Python type hints and this is the first time in MOABB. It could be nice to gradually update all the code base.
# TODO implement per class sampling. Not yet clear -> how to sample test set? | ||
|
||
|
||
class WithinSessionEvaluationIncreasingData(BaseEvaluation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to keep a only a few class for evaluation and to dispatch code depending on the parameters. Here, you could modify the evaluate
function of WithinSessionEvaluation
to something like :
def __init__(self, n_perms: Union[int, np.ndarray, None] = None,
data_size: Optional[dict] = None, **kwargs):
"""..."""
def _evaluate(self, dataset, pipelines):
# original evaluate() code
...
def _evaluate_increasing(self, dataset, pipelines):
# your evaluate() code
...
def evaluate(self, dataset, pipelines):
if self.n_perm: _evaluate_increasing(dataset, pipelines)
else: _evaluate(dataset, pipelines)
from tdlda import Vectorizer as JumpingMeansVectorizer | ||
from tdlda import TimeDecoupledLda as TDLDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example is very nice, but examples (especially those introducing a feature) should be reproducible without requiring to download any external code (other than MOABB or requirements.txt)
That being said, I found it very interesting to have advanced examples relying on external open source code, like you did.
Could you make 2 examples from your code?
- one simple example with just RG (no JM + LDA/TD-LDA)
- one advanced with the Jumping Means and Time Decoupled LDA, where you could provide further explanation on your method, pointing to your papers and indicating how to install the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and I think having two examples is a very good idea.
I just figured pyRiemann
is an external dependency as well, as it is only needed for the examples. Maybe the requirements could be split into requirements_core and requirements_extra or something to indicate what you would need to install when you just want to use moabb vs running the examples.
I also have some additional points that we could discuss in the moabb office hours next week. (Thursday evening unfortunately does not work for me this week. Most weeks actually.)
Kudos for the example, this is very nice! Thank you so much for keeping the computation low
|
I'm wondering why the code merged from PR #127 is marked as new, as the PR have been merged in master. |
Thanks for the review @sylvchev ! I addressed some of your comments directly in the review. |
I think we could safely squash merge this PR directly into master, the part that have been modified for additional_columns have not been modified since. |
After merging the black-reformatted into my ongoing PR, I noticed my branch was suddenly 93 commit ahead of master. To prevent confusion (and work on my part :-) ) I will close this PR, as I added a new one, i.e. #155 |
Once this PR is finalized, it closes #129.
For now, this PR should be seen as a proposal. I implemented this as a new Evaluation Class, but this could be merged into the normal
WithinSessionEvaluation
if desired.Currently the PR is based only on master, but I would like to have PR #127 merged to not be forced to parse the session string to find out which permutation / dataset size a result corresponds to.
Please extend this list with necessary tasks / edit current tasks.
Tasks: