Add support to read and write arrow files #948

tuethan1999 · 2021-05-25T18:26:57Z

Allow woodwork to read and write arrow files
Closes Add support for reading Arrow file (read_file) #578

codecov · 2021-05-25T18:28:27Z

Codecov Report

Merging #948 (f78ea1c) into main (51ed6db) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #948      +/-   ##
==========================================
- Coverage   99.92%   99.92%   -0.01%     
==========================================
  Files          45       46       +1     
  Lines        7553     7548       -5     
==========================================
- Hits         7547     7542       -5     
  Misses          6        6

Impacted Files	Coverage Δ
woodwork/tests/utils/test_utils.py	`100.00% <ø> (ø)`
woodwork/deserialize.py	`100.00% <100.00%> (ø)`
woodwork/serialize.py	`100.00% <100.00%> (ø)`
woodwork/tests/accessor/test_serialization.py	`100.00% <100.00%> (ø)`
woodwork/tests/utils/test_read_file.py	`100.00% <100.00%> (ø)`
woodwork/utils.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51ed6db...f78ea1c. Read the comment docs.

woodwork/utils.py

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

woodwork/serialize.py

woodwork/utils.py

tamargrey

looking good, just a comment about maybe creating a file for the read file tests

tamargrey · 2021-06-07T18:55:38Z

woodwork/tests/utils/test_utils.py

@@ -310,6 +310,70 @@ def test_read_file_parquet_no_params(sample_df_pandas, tmpdir):
    pd.testing.assert_frame_equal(df_from_parquet, schema_df)


+def test_read_file_arrow(sample_df_pandas, tmpdir):


We're building up a fair number of read_file tests. Maybe worth giving them their own file in tests/utils?

We also might be able to parameterize at least some of the read_file tests to reduce the number of separate tests.

We could probably combine the read_file/read_file_no_params for the parquet, arrow, and feather formats. The original test_read_file_no_params which tests csv looks different though.

@frances-h were you suggesting using pytest.parametrize? Something like we do in featuretools?

Yeah, as another option to splitting them out since a lot of the tests are basically the same.

I split them out in my latest commit. It's not as pretty but definitely cuts down on repetitive code.

Ethan Tu added 3 commits May 25, 2021 13:34

add arrow support

34b3076

add serialization for pandas

e118e55

documentation

ade4a27

auto-assign bot assigned tuethan1999 May 25, 2021

gsheni reviewed May 25, 2021

View reviewed changes

woodwork/utils.py Outdated Show resolved Hide resolved

Ethan Tu and others added 2 commits May 25, 2021 14:29

release notes

e50f186

Update woodwork/utils.py

ae61303

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

gsheni changed the title ~~Issue#578 Arrow file support~~ Add support to read and write arrow files May 25, 2021

gsheni requested review from tamargrey and frances-h May 25, 2021 18:46

tamargrey reviewed May 25, 2021

View reviewed changes

woodwork/serialize.py Outdated Show resolved Hide resolved

condense write dataframe

530c54e

gsheni reviewed May 26, 2021

View reviewed changes

woodwork/serialize.py Outdated Show resolved Hide resolved

Ethan Tu and others added 3 commits May 26, 2021 15:26

lint

09cd8bc

documentation

6818d1e

Merge branch 'main' into issue#578-arrow-file-support

4d67bc9

tuethan1999 requested review from gsheni and tamargrey May 26, 2021 20:49

frances-h reviewed Jun 2, 2021

View reviewed changes

woodwork/utils.py Outdated Show resolved Hide resolved

Ethan Tu added 3 commits June 3, 2021 14:01

duplicate arrow changes for feather

494314c

Merge branch 'main' into issue#578-arrow-file-support

0288423

fix tests

38c4278

tamargrey reviewed Jun 7, 2021

View reviewed changes

Ethan Tu added 4 commits June 9, 2021 14:03

move read_file tests to own file

4d06cfb

Merge branch 'main' into issue#578-arrow-file-support

9888f3b

remove uneccessary import

8d18331

parameterize test_read_file

10fe1b0

gsheni removed their request for review June 9, 2021 20:25

Merge branch 'main' into issue#578-arrow-file-support

f78ea1c

frances-h approved these changes Jun 9, 2021

View reviewed changes

tamargrey approved these changes Jun 9, 2021

View reviewed changes

gsheni merged commit 05cf871 into main Jun 9, 2021

gsheni deleted the issue#578-arrow-file-support branch June 9, 2021 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to read and write arrow files #948

Add support to read and write arrow files #948

tuethan1999 commented May 25, 2021

codecov bot commented May 25, 2021 •

edited

Loading

tamargrey left a comment

tamargrey Jun 7, 2021

frances-h Jun 8, 2021

tuethan1999 Jun 9, 2021

gsheni Jun 9, 2021

frances-h Jun 9, 2021

tuethan1999 Jun 9, 2021

		@@ -310,6 +310,70 @@ def test_read_file_parquet_no_params(sample_df_pandas, tmpdir):
		pd.testing.assert_frame_equal(df_from_parquet, schema_df)


		def test_read_file_arrow(sample_df_pandas, tmpdir):

Add support to read and write arrow files #948

Add support to read and write arrow files #948

Conversation

tuethan1999 commented May 25, 2021

codecov bot commented May 25, 2021 • edited Loading

Codecov Report

tamargrey left a comment

Choose a reason for hiding this comment

tamargrey Jun 7, 2021

Choose a reason for hiding this comment

frances-h Jun 8, 2021

Choose a reason for hiding this comment

tuethan1999 Jun 9, 2021

Choose a reason for hiding this comment

gsheni Jun 9, 2021

Choose a reason for hiding this comment

frances-h Jun 9, 2021

Choose a reason for hiding this comment

tuethan1999 Jun 9, 2021

Choose a reason for hiding this comment

codecov bot commented May 25, 2021 •

edited

Loading