Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add markdown documention for the data profiling example #34

Merged
merged 6 commits into from Sep 27, 2018
Merged

Conversation

sscdotopen
Copy link
Contributor

Issue #9

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sscdotopen sscdotopen requested a review from a team September 27, 2018 05:10
@stefan-grafberger
Copy link
Contributor

Should we link the corresponding classes in the Readme version? We don't mention some of the advanced configuration options we offer here, at the same time this makes this example very short and easy to understand. However, should me maybe link the corresponding classes for the profiles (and mention the approximate percentiles) and for the RunBuilder?

Also, would it make sense to show the JSON version of a profile and mention the convenience function we offer for that conversion?

We could also consider adding examples on how to visualize this profiles, what do you think about this?

@sscdotopen
Copy link
Contributor Author

It would be great to have a second example that shows the advanced options and gives hints on visualization. Could you create an issue for that?

Where exactly should we link add links? Each example always links to its scala file and we have a link to the example package in README.md already.

@stefan-grafberger
Copy link
Contributor

stefan-grafberger commented Sep 27, 2018

Maybe something like this?

We have two different types of profiles we return, StandardColumnProfiles and NumericColumnProfiles. If the cardinality of a column is low enough, we also include a histogram with the StandardColumnProfile.

We also have more advanced configuration options as part of the ColumnProfilerRunner API, see here. We will also include an example of a more advanced use case in the future.

However, even without this, it should be quite easy to find these classes in the code.

@sscdotopen sscdotopen merged commit a2ad7eb into master Sep 27, 2018
@sscdotopen sscdotopen deleted the deequ-9 branch September 27, 2018 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants