Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to read and write arrow files #948

Merged
merged 17 commits into from
Jun 9, 2021
Merged

Conversation

tuethan1999
Copy link
Contributor

@codecov
Copy link

codecov bot commented May 25, 2021

Codecov Report

Merging #948 (f78ea1c) into main (51ed6db) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #948      +/-   ##
==========================================
- Coverage   99.92%   99.92%   -0.01%     
==========================================
  Files          45       46       +1     
  Lines        7553     7548       -5     
==========================================
- Hits         7547     7542       -5     
  Misses          6        6              
Impacted Files Coverage Δ
woodwork/tests/utils/test_utils.py 100.00% <ø> (ø)
woodwork/deserialize.py 100.00% <100.00%> (ø)
woodwork/serialize.py 100.00% <100.00%> (ø)
woodwork/tests/accessor/test_serialization.py 100.00% <100.00%> (ø)
woodwork/tests/utils/test_read_file.py 100.00% <100.00%> (ø)
woodwork/utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51ed6db...f78ea1c. Read the comment docs.

woodwork/utils.py Outdated Show resolved Hide resolved
Ethan Tu and others added 2 commits May 25, 2021 14:29
Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>
@gsheni gsheni changed the title Issue#578 Arrow file support Add support to read and write arrow files May 25, 2021
woodwork/serialize.py Outdated Show resolved Hide resolved
woodwork/serialize.py Outdated Show resolved Hide resolved
woodwork/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@tamargrey tamargrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, just a comment about maybe creating a file for the read file tests

@@ -310,6 +310,70 @@ def test_read_file_parquet_no_params(sample_df_pandas, tmpdir):
pd.testing.assert_frame_equal(df_from_parquet, schema_df)


def test_read_file_arrow(sample_df_pandas, tmpdir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're building up a fair number of read_file tests. Maybe worth giving them their own file in tests/utils?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also might be able to parameterize at least some of the read_file tests to reduce the number of separate tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably combine the read_file/read_file_no_params for the parquet, arrow, and feather formats. The original test_read_file_no_params which tests csv looks different though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frances-h were you suggesting using pytest.parametrize? Something like we do in featuretools?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as another option to splitting them out since a lot of the tests are basically the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split them out in my latest commit. It's not as pretty but definitely cuts down on repetitive code.

@gsheni gsheni removed their request for review June 9, 2021 20:25
@gsheni gsheni merged commit 05cf871 into main Jun 9, 2021
@gsheni gsheni deleted the issue#578-arrow-file-support branch June 9, 2021 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for reading Arrow file (read_file)
4 participants