-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit test based on titanic dataset for column profiling. #191
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the contribution. There are a couple of minor issues that I would ask you to address. The build also failed because of some style violations, please have a look at that as well.
error file=/home/travis/build/awslabs/deequ/src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala message=Whitespace at end of line line=408 column=0
error file=/home/travis/build/awslabs/deequ/src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala message=Whitespace at end of line line=416 column=0
error file=/home/travis/build/awslabs/deequ/src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala message=Whitespace at end of line line=422 column=0
error file=/home/travis/build/awslabs/deequ/src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala message=Insert a space after the start of the comment line=415 column=58
src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, two tiny style issues, then we are good to merge!
assert(actual.dataType == expected.dataType, msg) | ||
assert(actual.completeness >= expected.completeness, msg) | ||
assert(actual.isDataTypeInferred == expected.isDataTypeInferred, msg) | ||
if(expected.approximateNumDistinctValues > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after if please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
891, | ||
DataTypeInstances.Integral, | ||
false, | ||
Map(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Map.empty instead of Map() here and in the subsequent calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Thank you for the contribution! |
Thank you for the help :-). |
More tests on more datasets are always welcome, but I dont have a specific dataset in mind. You can alternatively have a look at the open issues and check if there is something interesting for you. |
*Issue #22
Description of changes:
New unit test for column profiling based on external file with titanic dataset.
It doesn't fix any problems but I think it can be useful as simple example how to create this kind of tests based on the external datasets. It shows where to put the dataset, how to read it and how to add information about its source. Also it has 891 rows so is bigger than other smaller test cases.
The only not exact value is number of unique values in 'PassengerId` column. should be 891 but is approximated as 887. So probably it is ok.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.