-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to read and write arrow files #948
Conversation
tuethan1999
commented
May 25, 2021
- Allow woodwork to read and write arrow files
- Closes Add support for reading Arrow file (read_file) #578
Codecov Report
@@ Coverage Diff @@
## main #948 +/- ##
==========================================
- Coverage 99.92% 99.92% -0.01%
==========================================
Files 45 46 +1
Lines 7553 7548 -5
==========================================
- Hits 7547 7542 -5
Misses 6 6
Continue to review full report at Codecov.
|
Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good, just a comment about maybe creating a file for the read file tests
woodwork/tests/utils/test_utils.py
Outdated
@@ -310,6 +310,70 @@ def test_read_file_parquet_no_params(sample_df_pandas, tmpdir): | |||
pd.testing.assert_frame_equal(df_from_parquet, schema_df) | |||
|
|||
|
|||
def test_read_file_arrow(sample_df_pandas, tmpdir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're building up a fair number of read_file
tests. Maybe worth giving them their own file in tests/utils
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also might be able to parameterize at least some of the read_file
tests to reduce the number of separate tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably combine the read_file/read_file_no_params for the parquet, arrow, and feather formats. The original test_read_file_no_params
which tests csv looks different though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frances-h were you suggesting using pytest.parametrize? Something like we do in featuretools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as another option to splitting them out since a lot of the tests are basically the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split them out in my latest commit. It's not as pretty but definitely cuts down on repetitive code.