-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Lo splitter #9
Conversation
Also, comment please on the code style. I didn't use any formatter yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work @SteshinSS ! I'm very grateful for the documentation and test cases! 🚀
My main question is whether we can simplify the code by using some of the functionality of Datamol.
For the formatting, we use ruff
, which is (mostly) the same as black
.
Thank you for the thorough review. The code is very complex, so any way to simplify it would be helpful. I'll return with the revision. |
Changes in the new commit:
Please review this version, and if it looks fine, I will include documentation in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your contribution @SteshinSS!
I really like the LO splitting method.
I left few comments.
Thanks, guys, for the review! It seems that the PR has mostly converged. Please check my old tutorial on the Lo splitter. I'm going to add more details, remove the k-fold examples, and rewrite the code using Splito and datamol datasets. What do you think about that? Is there anything that you would like to see in the documentation? Is there anything that could be clearer? |
Hi @SteshinSS, thanks for the nice work so far! As to the documentation, I would focus on the why rather than on the what. For a large part I find the code self-explanatory and won't need comments that explain me what's happening. Instead, I would like to understand why it has been implemented in a certain way and - zooming out - what the reasoning is behind this splitting method to begin with. You can of course always refer to your paper, but I would assume people have not read it. |
Done :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, @SteshinSS ! You went above and beyond for the docs! Thanks a lot for the great work! 📚🚀
I have some suggestions to simplify the code for the test case and then one final comment: I recently merged #12. Could you integrate the Lo splitter with the changes from this PR? It would be great if people could do:
X_train, X_test, y_train, y_test = train_test_split(X, y, smiles=smiles, method="lo")
One question here: Given that the Lo splitter returns multiple test sets, what would you suggest the value of X_test
and y_test
to look like? Should it just be lists of arrays?
@SteshinSS Maybe worth saying that I didn't test the suggested changes above. It's from the top of my head. |
I just checked, and I'm not sure we want to add There are many splitters with a similar interface, accepting
If we add
As an alternative, Have you considered a functional approach? For example, splito could offer not only classes but also pure functions for splitting, similar to |
I like the idea of a functional module! That's indeed maybe the better way to go. Thanks for the elaborate suggestion and considerations! However, I don't want to scope creep this PR too much, so could you maybe open an issue with your idea and then we can merge this PR! Thanks again 🙏 |
Please just make sure the code is properly formatted with Ruff! |
I believe it could be merged :) |
Hi, all. @cwognum suggested adding Lo-Hi splitter into Splito. Here is the first part of it.
Changelogs
If the code seems ok, I'll add documentation and update this PR.
Checklist:
feature
,fix
ortest
(or ask a maintainer to do it for you).