Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nb/tabular/increase speed outlier detection + timeout #1414

Merged
merged 14 commits into from May 12, 2022

Conversation

Nadav-Barak
Copy link
Collaborator

@Nadav-Barak Nadav-Barak commented May 10, 2022

Major increase in running speed due to vectorization and other cool numpy tricks + Added a timeout mechanism.

It is still not that fast (7 sec for 10K samples with dimension 13).

@Nadav-Barak Nadav-Barak requested review from a team, ItayGabbay and shir22 as code owners May 10, 2022 14:23
@Nadav-Barak Nadav-Barak linked an issue May 10, 2022 that may be closed by this pull request
deepchecks/utils/gower_distance.py Outdated Show resolved Hide resolved
@@ -88,3 +57,26 @@ def test_mix_columns_full_matrix_with_nulls():
assert_that(dist[-1], has_length(data.shape[0]))
assert_that(dist[3], has_item(greater_than(0.01)))
assert_that(min(dist[0]), less_than_or_equal_to(0))


def test_mix_columns_nn_matrix_with_nulls_vectorized():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, are gowers dependencies OK?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependencies were only added to dev-requirements so as far i understood are not downloaded with the package, am I correct? @matanper

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it's right, just making sure that the testing env will be functional.

@noamzbr
Copy link
Collaborator

noamzbr commented May 11, 2022

Re-add the check to the integrity suite.

@Nadav-Barak
Copy link
Collaborator Author

Already added

@Nadav-Barak Nadav-Barak merged commit f4d1e6d into main May 12, 2022
@delete-merged-branch delete-merged-branch bot deleted the NB/Tabular/increase_speed_oulier_detection branch May 12, 2022 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] [Tabular] outlier detection is slow
3 participants