Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Python] Run Arrow Python C++ with pytest and make sure they are run by the CI #32328

Closed
Tracked by #32425
asfimport opened this issue Jul 8, 2022 · 13 comments
Closed
Tracked by #32425

Comments

@asfimport
Copy link

After #13311 is merged{}, the tests for C PyArrow should be moved to Cython and the documentation about running the tests with GTest should be removed.{} we should:

  • try to run the tests from the pytest to make the whole process of code quality check simplified.

  • make sure the CI runs these PyArrow C++ test. Currently they are not being run on the CI due to all the builds having GTest not bundled, see: https://arrow.apache.org/docs/dev/developers/python.html#testing-pyarrow-c

    The  migration of the tests to Cython has not proved as a good option as there are couple of tests that should stay in the C++, for example
    TestMoves and similar. For this reason we will find some other solution to keep using GTest and run them with pytest, if possible.

Reporter: Alenka Frim / @AlenkaF
Assignee: Alenka Frim / @AlenkaF

PRs and other links:

Note: This issue was originally created as ARROW-17016. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Alenka Frim / @AlenkaF:
The approach on how we use GTest in PyArrow C\+\+ needs a change. Currently it depends on GTest being built from source with Arrow C\+\+. There are two approaches I see to change that:

  • expect GTest to be installed with Arrow C\+\+ and use find_package(GTest REQUIRED)

  • require GTest as a test dependency for PyArrow C\+\+ separately

    The first approach seems the best: PyArrow C\+\+ tests would be built if ARROW_BUILD_TESTS=on and in that case GTest is installed. Then I would try to use pytest to run the C\ tests. I do think I had some issues with finding the package when doing the refactoring work and am not sure it will work out of the box.

    @raulcd thank you for helping me with brainstorming for ideas!

    @kou, @pitrou could I get your feedback on this?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
The big drawback with requiring Arrow C++ tests to be built is that it makes the build time significantly longer (there are many C++ tests).

So I would be in favor of making GTest an (optional) dependency of PyArrow C++.

@asfimport
Copy link
Author

Alenka Frim / @AlenkaF:
In that case there will have to be extra environment variable specified by the developer/user before building PyArrow to turn the tests on for PyArrow separately. I am fine with this approach and can go with that.

There was also a question Raul had on this approach when we brainstormed and I wonder also: will we have to, in this case, have similar third party dependency infrastructure as we do for Arrow C++ to download third party dependencies? Or will that not be needed only for GTest?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Hmm, that's a good question. @kou what do you think?

@asfimport
Copy link
Author

Kouhei Sutou / @kou:
Sorry, I'm not familiar with Cython. Could you explain the reason for the following case? We can't write TestMoves family with Cython?

there are couple of tests that should stay in the C\, for example TestMoves and similar.

In general, I don't want PyArrow C\ to have Arrow C\'s ThirdpartyToolchain.cmake like feature to reduce maintenance cost. Maintaining ThirdpartyToolchain.cmake is high cost for us... So I like the "migrating PyArrow C\ tests from GTest based to pytest\Cython based" approach if possible. If we can remove GTest dependency from PyArrow C\, PyArrow C\+ depends on only our Arrow C\ related CMake packages.

Is TestMoves family only the blocker of the approach? Or are there more blockers?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Cython does not really allow to express C++ concepts precisely. So some tests have to be written in C++.
If we don't want to use GTest, then we could perhaps still write some tests in C++, have them return a Status, then execute them from pytest?

Draft example:

#define ASSERT_EQ(x, y) { \
  auto&& _left = (x); \
  auto&& _right = (y); \
  if (_left != _right) { \
    return Status::Invalid("Expected equality but ", _left, " != ", _right); \
  } \
}

Status TestOwnedRefMoves() {
  std::vector<OwnedRef> vec;
  PyObject *u, *v;
  u = PyList_New(0);
  v = PyList_New(0);
  {
    OwnedRef ref(u);
    vec.push_back(std::move(ref));
    ASSERT_EQ(ref.obj(), nullptr);
  }
  vec.emplace_back(v);
  ASSERT_EQ(Py_REFCNT(u), 1);
  ASSERT_EQ(Py_REFCNT(v), 1);
  return Status::Ok();
}
# Cython test code

def test_owned_ref_moves():
    check_status(TestOwnedRefMoves())

@asfimport
Copy link
Author

Alenka Frim / @AlenkaF:
I think the suggestion to keep this tests in C++ but return a Status and then check these in the Cython is great, thank you for such a clear example!

I will start with implement this approach first thing and add it to the draft PR for this issue. If it works well I will continue with this work and ping once the PR is ready.

Thank you @kou @pitrou !!

@asfimport
Copy link
Author

Alenka Frim / @AlenkaF:
@pitrou would it make sense to leave all the C++ test where they are (even the decimal group) and have them return a Status also? Or would you rather see this group separately in Cython?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
We could keep all the C++ tests if that's easier, yes.

@asfimport
Copy link
Author

Alenka Frim / @AlenkaF:
At this point it is not necessarily easier (almost half of the decimal tests are already in Cython and I haven't tried the last approach to turn them into Status in the C++ yet. But am asking as it seems it would be nicer to have them all in one place.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Hmm. Yes, it would actually be nicer to have them all in the same place and using the same idiom, IMHO.

@asfimport
Copy link
Author

Alenka Frim / @AlenkaF:
Great, thanks for quick feedback!

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 14117
#14117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants