Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logical_types validation check #49

Merged
merged 6 commits into from Sep 3, 2020
Merged

Conversation

thehomebrewnerd
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd commented Sep 3, 2020

Closes #41

Added new private function in data_table.py to validate the information in logical_types if it is provided during creation of a DataTable. This check verifies that a dictionary is passed and that all of the keys in the provided dictionary are valid columns in the underlying dataframe.

if not isinstance(logical_types, dict):
raise TypeError('logical_types must be a dictionary')
cols_not_found = set(logical_types.keys()).difference(set(dataframe.columns))
if cols_not_found:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure the logical types they are providing are actually in our Library

from data_tables.logical_types import LogicalType

    for l_type in logical_types:
        assert l_type in LogicalType.__subclasses__()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that thought crossed my mind as well. Do you think it would be better to add this check here or during the DataColumn creation? If we add it here, we probably also should add it so that check happens when creating a DataColumn too.

Copy link
Contributor

@gsheni gsheni Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 1 check is better? Do it in DataColumn only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking - we check the keys here because they aren't used in DataColumn, but we check for a valid logical type in DataColumn when we need to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated DataColumn init to check for an invalid LogicalType.

}
error_message = re.escape("logical_types contains columns that are not present in dataframe: ['birthday', 'occupation']")
with pytest.raises(LookupError, match=error_message):
_check_logical_types(sample_df, bad_logical_types_keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test for setting a logical type that doesn't exist, like Numeric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test

@codecov
Copy link

codecov bot commented Sep 3, 2020

Codecov Report

Merging #49 into main will decrease coverage by 13.55%.
The diff coverage is 15.15%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main      #49       +/-   ##
===========================================
- Coverage   57.03%   43.47%   -13.56%     
===========================================
  Files           8        8               
  Lines         128      184       +56     
===========================================
+ Hits           73       80        +7     
- Misses         55      104       +49     
Impacted Files Coverage Δ
data_tables/data_column.py 14.54% <13.20%> (-14.03%) ⬇️
data_tables/data_table.py 24.24% <18.18%> (-2.43%) ⬇️
data_tables/tests/conftest.py 71.42% <50.00%> (-8.58%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89d090e...43c3d55. Read the comment docs.

Copy link
Contributor

@gsheni gsheni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@gsheni gsheni merged commit 1dd4cc3 into main Sep 3, 2020
@gsheni gsheni deleted the logical-types-validation branch September 3, 2020 18:23
@gsheni gsheni mentioned this pull request Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check that columns specified in logical_types are present in dataframe
2 participants