Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test based on titanic dataset for column profiling. #191

Merged
merged 4 commits into from
Feb 12, 2020

Conversation

klangner
Copy link
Contributor

*Issue #22

Description of changes:

New unit test for column profiling based on external file with titanic dataset.
It doesn't fix any problems but I think it can be useful as simple example how to create this kind of tests based on the external datasets. It shows where to put the dataset, how to read it and how to add information about its source. Also it has 891 rows so is bigger than other smaller test cases.

The only not exact value is number of unique values in 'PassengerId` column. should be 891 but is approximated as 887. So probably it is ok.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@sscdotopen sscdotopen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the contribution. There are a couple of minor issues that I would ask you to address. The build also failed because of some style violations, please have a look at that as well.

error file=/home/travis/build/awslabs/deequ/src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala message=Whitespace at end of line line=408 column=0

error file=/home/travis/build/awslabs/deequ/src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala message=Whitespace at end of line line=416 column=0

error file=/home/travis/build/awslabs/deequ/src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala message=Whitespace at end of line line=422 column=0

error file=/home/travis/build/awslabs/deequ/src/test/scala/com/amazon/deequ/profiles/ColumnProfilerTest.scala message=Insert a space after the start of the comment line=415 column=58

Copy link
Contributor

@sscdotopen sscdotopen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, two tiny style issues, then we are good to merge!

assert(actual.dataType == expected.dataType, msg)
assert(actual.completeness >= expected.completeness, msg)
assert(actual.isDataTypeInferred == expected.isDataTypeInferred, msg)
if(expected.approximateNumDistinctValues > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after if please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

891,
DataTypeInstances.Integral,
false,
Map(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Map.empty instead of Map() here and in the subsequent calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@sscdotopen sscdotopen merged commit 394ecc1 into awslabs:master Feb 12, 2020
@sscdotopen
Copy link
Contributor

Thank you for the contribution!

@klangner
Copy link
Contributor Author

Thank you for the help :-).
I guess there is no need at the moment for another dataset?
If not I will look for another easy task, or if you think that any specific dataset could be useful just let me know.

@sscdotopen
Copy link
Contributor

More tests on more datasets are always welcome, but I dont have a specific dataset in mind. You can alternatively have a look at the open issues and check if there is something interesting for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants