Skip to content

data split confusion #1

@code4luck

Description

@code4luck

Hello, a great job. But I have some confusion about the train and test data split, in do_kfold.py and do_stratified_kfold.py use sklearn KFold to split the data and only have train and test set with a random way, in paper "It is crucial to consider these non-binders as outliers during model development and evaluation to ensure model accuracy and robustness." for the S645 dataset. Does handling these outliers mean directly deleting the data with a ddG==8 in S645? In the last, Could you please provide a more detailed document on training or inference? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions