Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable column profile histogram analysis for columns of numeric types #97

Conversation

paulsukow
Copy link
Contributor

Fix for issue: #95

Description of changes:
Histogram analysis for column profiling was limited to boolean and string columns. However, it is possible to have categorical data with numerical values (ex 1 indicating male and 2 indicating female). Made changes to allow histogram analysis for columns with short, long, double, float, and integer data types

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@sscdotopen sscdotopen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Paul,

The PR looks really good, I only have a few "cosmetic" requests :)

@sscdotopen
Copy link
Contributor

I checked our coding style https://github.com/databricks/scala-style-guide#indent and you are right that we dont need to indent long method invocations with 4 spaces, sorry for that. I deleted the corresponding comments.

@paulsukow
Copy link
Contributor Author

@sscdotopen Ok, I made those changes

@paulsukow
Copy link
Contributor Author

@sscdotopen can you re-review this pr?

@sscdotopen sscdotopen merged commit 15b2006 into awslabs:master Apr 25, 2019
@sscdotopen
Copy link
Contributor

Looks good, thank you!

@paulsukow paulsukow deleted the feature/columnProfileHistogramsFromNumericDataWithLowCardinality branch April 25, 2019 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants