ARROW-16548: [Python] Add pytest.mark.parquet to all tests under tests/parquet package #13147

raulcd · 2022-05-13T10:05:49Z

If we built arrow and pyarrow without PARQUET and tried to run the PARQUET tests. The first test being executed was not correctly marked.

> /home/raulcd/open_source/arrow/python/pyarrow/tests/conftest.py(244)pytest_runtest_setup()
-> for mark in item.iter_markers():
(Pdb) item
<Function test_parquet_invalid_version>
(Pdb) [x for x in item.iter_markers()]
[]
(Pdb)

Meaning the test was not correctly skipped. This was found on the minimal builds PR: #13113 on this job failures: https://github.com/ursacomputing/crossbow/runs/6407176338?check_suite_focus=true

All tests under the parquet package are on the structure:

parquet
├── test_basic.py
├── test_compliant_nested_type.py
├── test_dataset.py
├── test_data_types.py
├── test_datetime.py
├── test_encryption.py
├── test_metadata.py
├── test_pandas.py
├── test_parquet_file.py
└── test_parquet_writer.py

The implementation marks all the individual tests that are on this structure with the parquet dataset mark correctly.

…s/parquet package

github-actions · 2022-05-13T10:06:16Z

https://issues.apache.org/jira/browse/ARROW-16548

github-actions · 2022-05-13T10:06:18Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

raulcd · 2022-05-13T10:25:44Z

@jorisvandenbossche other solutions that fix this issue could be:

Add the pytestmark = pytest.mark.parquet to each test file individually instead of to the common file
Add the pytestmark import (even when we are not using it) on each individual test file (from pyarrow.tests.parquet.common import pytestmark)

I was going in a rabbit hole on importlib and load_module but I don't think is worth the effort to spend more time on this having three possible solutions.
One nice thing is I had to go back to a talk I gave at PyCon Spain on how import works :)

jorisvandenbossche · 2022-05-13T11:06:22Z

Hmm, I would actually have expected that the pytestmark = .. in the tests/parquet/__init__.py would ensure this applies that mark to all the tests in that directory (and not the one in common.py).
But if that seems to not work correctly in practice, I think adding pytestmark = pytest.mark.parquet to every file might be the most "low tech" solution?

…the pytest mark on each parquet python file

raulcd · 2022-05-13T12:56:17Z

Thanks @jorisvandenbossche . I was able to reproduce the issue on each one of the test files individually. This solution solves the test failures when PARQUET is not enabled.

pitrou

+1 from me assuming that it does solve the issue.

ursabot · 2022-05-17T15:51:09Z

Benchmark runs are scheduled for baseline = 52a051b and contender = c032290. c032290 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.08% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.36% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.67% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] c032290b ec2-t3-xlarge-us-east-2
[Finished] c032290b test-mac-arm
[Finished] c032290b ursa-i9-9960x
[Finished] c032290b ursa-thinkcentre-m75q
[Finished] 52a051b1 ec2-t3-xlarge-us-east-2
[Finished] 52a051b1 test-mac-arm
[Finished] 52a051b1 ursa-i9-9960x
[Finished] 52a051b1 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

jorisvandenbossche · 2022-05-18T14:15:02Z

python/pyarrow/tests/parquet/test_encryption.py

 pytestmark = pytest.mark.parquet_encryption
+pytestmark = pytest.mark.parquet


Ah, we can't assign two marks this way (it is redefining the same variable), it should be something like pytestmark = [..., ...] I think

ARROW-16548: [Python] Add pytest.mark.parquet to all tests under test…

9d138ba

…s/parquet package

github-actions bot added the Component: Python label May 13, 2022

Remove manual test marking via pytest_collection_modifyitems and add …

a2276f4

…the pytest mark on each parquet python file

pitrou approved these changes May 16, 2022

View reviewed changes

jorisvandenbossche approved these changes May 17, 2022

View reviewed changes

jorisvandenbossche closed this in c032290 May 17, 2022

jorisvandenbossche reviewed May 18, 2022

View reviewed changes

raulcd mentioned this pull request May 18, 2022

MINOR: Fix wrongly redefining pytestmark for parquet encryption tests #13189

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-16548: [Python] Add pytest.mark.parquet to all tests under tests/parquet package #13147

ARROW-16548: [Python] Add pytest.mark.parquet to all tests under tests/parquet package #13147

raulcd commented May 13, 2022

github-actions bot commented May 13, 2022

github-actions bot commented May 13, 2022

raulcd commented May 13, 2022

jorisvandenbossche commented May 13, 2022

raulcd commented May 13, 2022

pitrou left a comment

ursabot commented May 17, 2022

jorisvandenbossche May 18, 2022

		pytestmark = pytest.mark.parquet_encryption
		pytestmark = pytest.mark.parquet

ARROW-16548: [Python] Add pytest.mark.parquet to all tests under tests/parquet package #13147

ARROW-16548: [Python] Add pytest.mark.parquet to all tests under tests/parquet package #13147

Conversation

raulcd commented May 13, 2022

github-actions bot commented May 13, 2022

github-actions bot commented May 13, 2022

raulcd commented May 13, 2022

jorisvandenbossche commented May 13, 2022

raulcd commented May 13, 2022

pitrou left a comment

Choose a reason for hiding this comment

ursabot commented May 17, 2022

jorisvandenbossche May 18, 2022

Choose a reason for hiding this comment