ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py #8244

arw2019 · 2020-09-23T05:05:05Z

xref #6849 (comment)

The goals of this PR are:

remove hard pandas dependency from FeatherDataset's __init__. Now we only check if pandas is present & if it is a recent enough version in methods like read_pandas that use it
remove hard pandas dependencies in test_feather.py by switching to pyarrow methods whenever possible (e.g. use pa.table to generate data; compare tables directly using pa.equals)
reduce boilerplate by using _check_arrow_roundtrip and _check_pandas_roundtrip wherever possible

github-actions · 2020-09-23T05:05:53Z

https://issues.apache.org/jira/browse/ARROW-8355

jorisvandenbossche

Thanks for working on this!
Can you check that all @pytest.mark.pandas decorators have been removed where possible?

jorisvandenbossche · 2020-09-30T13:25:59Z

python/pyarrow/tests/test_feather.py

-    df = pd.DataFrame(np.random.randn(*num_values),
-                      columns=['col_' + str(i)
-                               for i in range(num_values[1])])
+    table = pa.Table.from_arrays(


In principle we could also use pa.table(.. here instead (a bit more ergonomic to use)

will switch to that

jorisvandenbossche · 2020-09-30T13:28:06Z

python/pyarrow/tests/test_feather.py

@@ -128,19 +128,22 @@ def test_dataset(version):
    num_values = (100, 100)


I think the @pytest.mark.pandas can be removed here as well?

For this to work I had to remove the call to _check_pandas_version in FeatherDataset's __init__. I think that was a bug since FeatherDataset is supposed to work without pandas. I added a call to _check_pandas_version in FeatherDataset.read_pandas since we do want it there

emkornfield · 2020-10-07T08:07:16Z

@jorisvandenbossche can this be merged now?

jorisvandenbossche · 2020-10-07T12:22:36Z

I don't think @arw2019 already pushed the update? (although answered to the comments)

… in test_feather

arw2019 · 2020-10-09T03:52:38Z

@jorisvandenbossche @emkornfield This is ready to go:

addressed comments
updated title & description
rebased (CI green)

jorisvandenbossche · 2020-10-09T14:49:05Z

Thanks @arw2019 !

arw2019 · 2020-10-09T14:54:30Z

Thanks @jorisvandenbossche for reviewing!

…t and minimize pandas dependency in test_feather.py xref apache#6849 (comment) The goals of this PR are: - [x] remove hard `pandas` dependency from `FeatherDataset`'s `__init__`. Now we only check if `pandas` is present & if it is a recent enough version in methods like `read_pandas` that use it - [x] remove hard `pandas` dependencies in `test_feather.py` by switching to `pyarrow` methods whenever possible (e.g. use `pa.table` to generate data; compare tables directly using `pa.equals`) - [x] reduce boilerplate by using `_check_arrow_roundtrip` and `_check_pandas_roundtrip` wherever possible Closes apache#8244 from arw2019/ARROW-8355 Authored-by: arw2019 <andrew.r.wieteska@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

arw2019 force-pushed the ARROW-8355 branch 7 times, most recently from de13ef4 to 2dd24e6 Compare September 28, 2020 14:23

jorisvandenbossche reviewed Sep 30, 2020

View reviewed changes

arw2019 force-pushed the ARROW-8355 branch from 2dd24e6 to 6682cc1 Compare October 7, 2020 14:51

ARROW-8355: [Python] Reduce the number of pandas dependent test cases…

956907c

… in test_feather

arw2019 force-pushed the ARROW-8355 branch from 6682cc1 to e205364 Compare October 7, 2020 15:34

feedback

2ecd01e

arw2019 force-pushed the ARROW-8355 branch from e205364 to 2ecd01e Compare October 7, 2020 16:05

arw2019 added 2 commits October 7, 2020 13:25

linting

021b5a2

remove _check_pandas_version from FeatherDataset __init__

c78cdf9

arw2019 changed the title ~~ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather~~ ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py Oct 9, 2020

jorisvandenbossche approved these changes Oct 9, 2020

View reviewed changes

jorisvandenbossche closed this in b8b4419 Oct 9, 2020

asfimport mentioned this pull request Oct 9, 2020

[Python] Reduce the number of pandas dependent test cases in test_feather #24540

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py #8244

ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py #8244

arw2019 commented Sep 23, 2020 •

edited

github-actions bot commented Sep 23, 2020

jorisvandenbossche left a comment

jorisvandenbossche Sep 30, 2020

arw2019 Oct 5, 2020

jorisvandenbossche Sep 30, 2020

arw2019 Oct 7, 2020

arw2019 Oct 7, 2020

emkornfield commented Oct 7, 2020

jorisvandenbossche commented Oct 7, 2020

arw2019 commented Oct 9, 2020

jorisvandenbossche commented Oct 9, 2020

arw2019 commented Oct 9, 2020

		@@ -128,19 +128,22 @@ def test_dataset(version):
		num_values = (100, 100)

ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py #8244

ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py #8244

Conversation

arw2019 commented Sep 23, 2020 • edited

github-actions bot commented Sep 23, 2020

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Sep 30, 2020

Choose a reason for hiding this comment

arw2019 Oct 5, 2020

Choose a reason for hiding this comment

jorisvandenbossche Sep 30, 2020

Choose a reason for hiding this comment

arw2019 Oct 7, 2020

Choose a reason for hiding this comment

arw2019 Oct 7, 2020

Choose a reason for hiding this comment

emkornfield commented Oct 7, 2020

jorisvandenbossche commented Oct 7, 2020

arw2019 commented Oct 9, 2020

jorisvandenbossche commented Oct 9, 2020

arw2019 commented Oct 9, 2020

arw2019 commented Sep 23, 2020 •

edited