-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py #8244
Conversation
de13ef4
to
2dd24e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Can you check that all @pytest.mark.pandas
decorators have been removed where possible?
python/pyarrow/tests/test_feather.py
Outdated
df = pd.DataFrame(np.random.randn(*num_values), | ||
columns=['col_' + str(i) | ||
for i in range(num_values[1])]) | ||
table = pa.Table.from_arrays( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle we could also use pa.table(..
here instead (a bit more ergonomic to use)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will switch to that
@@ -128,19 +128,22 @@ def test_dataset(version): | |||
num_values = (100, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the @pytest.mark.pandas
can be removed here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to work I had to remove the call to _check_pandas_version
in FeatherDataset
's __init__
. I think that was a bug since FeatherDataset
is supposed to work without pandas
. I added a call to _check_pandas_version
in FeatherDataset.read_pandas
since we do want it there
@jorisvandenbossche can this be merged now? |
I don't think @arw2019 already pushed the update? (although answered to the comments) |
@jorisvandenbossche @emkornfield This is ready to go:
|
Thanks @arw2019 ! |
Thanks @jorisvandenbossche for reviewing! |
…t and minimize pandas dependency in test_feather.py xref apache#6849 (comment) The goals of this PR are: - [x] remove hard `pandas` dependency from `FeatherDataset`'s `__init__`. Now we only check if `pandas` is present & if it is a recent enough version in methods like `read_pandas` that use it - [x] remove hard `pandas` dependencies in `test_feather.py` by switching to `pyarrow` methods whenever possible (e.g. use `pa.table` to generate data; compare tables directly using `pa.equals`) - [x] reduce boilerplate by using `_check_arrow_roundtrip` and `_check_pandas_roundtrip` wherever possible Closes apache#8244 from arw2019/ARROW-8355 Authored-by: arw2019 <andrew.r.wieteska@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
xref #6849 (comment)
The goals of this PR are:
pandas
dependency fromFeatherDataset
's__init__
. Now we only check ifpandas
is present & if it is a recent enough version in methods likeread_pandas
that use itpandas
dependencies intest_feather.py
by switching topyarrow
methods whenever possible (e.g. usepa.table
to generate data; compare tables directly usingpa.equals
)_check_arrow_roundtrip
and_check_pandas_roundtrip
wherever possible