New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logical_types validation check #49
Conversation
if not isinstance(logical_types, dict): | ||
raise TypeError('logical_types must be a dictionary') | ||
cols_not_found = set(logical_types.keys()).difference(set(dataframe.columns)) | ||
if cols_not_found: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure the logical types they are providing are actually in our Library
from data_tables.logical_types import LogicalType
for l_type in logical_types:
assert l_type in LogicalType.__subclasses__()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that thought crossed my mind as well. Do you think it would be better to add this check here or during the DataColumn creation? If we add it here, we probably also should add it so that check happens when creating a DataColumn too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 1 check is better? Do it in DataColumn only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was thinking - we check the keys here because they aren't used in DataColumn, but we check for a valid logical type in DataColumn when we need to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated DataColumn init to check for an invalid LogicalType.
} | ||
error_message = re.escape("logical_types contains columns that are not present in dataframe: ['birthday', 'occupation']") | ||
with pytest.raises(LookupError, match=error_message): | ||
_check_logical_types(sample_df, bad_logical_types_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test for setting a logical type that doesn't exist, like Numeric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test
Codecov Report
@@ Coverage Diff @@
## main #49 +/- ##
===========================================
- Coverage 57.03% 43.47% -13.56%
===========================================
Files 8 8
Lines 128 184 +56
===========================================
+ Hits 73 80 +7
- Misses 55 104 +49
Continue to review full report at Codecov.
|
… logical-types-validation
…Labs/datatables into logical-types-validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Closes #41
Added new private function in
data_table.py
to validate the information inlogical_types
if it is provided during creation of aDataTable
. This check verifies that a dictionary is passed and that all of the keys in the provided dictionary are valid columns in the underlying dataframe.