Skip to content

Scaling factors for metafeatures should not be learned using test data #701

@bjschoenfeld

Description

@bjschoenfeld

In kND.py, the _scale method uses the test data (other) to learn the min and max values used to scale the metafeatures. It is typically not a good idea to use test data to learn any parameters.

mins = pd.DataFrame(data=[mins, other]).min()
maxs = pd.DataFrame(data=[maxs, other]).max()

Would it be a good idea to use sklearn's MinMaxScaler anyways?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions