-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3.9.6 threading compatibility #250
Comments
We got the issue 8 out of 10 times on our CI pipeline. By adding |
With the help of this post:https://stackoverflow.com/questions/15349997/assertionerror-when-threading-in-python We found out that the library under test has a subclass of I suppose it got stuck because the code under test got stuck. |
Glad to hear the problem was somewhere else. 😉 |
@CleanCut unfortunately I combed over the library that is causing green to freeze and I could only find a small bit of code where we subclass threading.Thread so we can collect the return value of run(). We do not use threads anywhere else in that library and we are not using I was not able to reproduce locally on macOS either natively or under the same docker container we use in our CI. It is very consistent on our CI system with python 3.9.6. I see that you wrote That's my 1AM analysis and might be wrong but so far I really think the bug is in |
I did some experiments and patching green/process.py with no success. I'll investigate further and will let you know. |
👍🏻 I'll wait to hear what you find. |
I've patched
This is what I get:
Somehow we end up in the init of BaseProcess with a group argument set to I do not have deep experience with the multiprocessing library so my understanding is limited. The other issue is that this has been very difficult to reproduce. It is very consistently failing under our CI pipeline conditions but changing some elements avoids the issue:
Let me know if the data I dumped helps you and if there are other things you would like me to dump. Unfortunately I cannot share a reproducible case since the docker image and our code is private. |
I'll pause my investigation for a few days. I tried many different combinations but I am still unable to pinpoint any specific test or combination of tests that trigger the issue. It seems that the number of tests might be a factor since we have a few hundred tests in that directory. The next thing I will try is to revert the two threading patches added in python 3.9.6 and see if they are the ones triggering the issue. |
Good news! I was able to reproduce the problem! 🎉 That diff you linked to was super helpful. You were correct that the problem was because of the addition of I was able to reproduce the problem in aa572e1 by adding a Now to fix the problem... |
Fixed in Green 3.3.0, which I just released. 🥳 🎈 🎉 |
Thanks a lot for the quick fix!!! I'll try 3.3.0 in our CI pipeline tomorrow and will let you know how it goes. |
🤞🏻 |
I've tested green 3.3.0 and I can confirm that the AssertionError stack trace is gone, however the tests are still stuck in parallel mode and never return. Passing the I'll try to run some more experiments over the weekend. I'm wondering if adding a timeout per test might help me pinpoint which test might cause the situation to trigger. The good new is that thanks to your new CLI option I was able to reproduce on my laptop under macOS which will help a lot. Here is what I suspect might be happening. We run our CI under Kubernetes where the container is not getting the full CPU capacity of the host, but many processes in the container still believe the process has access to all the CPU/RAM resources. Without I did confirm that As far as I'm concern green is now even better today than it was last week :-) |
Oh, good. I'm glad you found a workaround. If we can find both a) a way to determine that we are running in k8s (an environment variable that k8s always sets, maybe?), AND b) a way for Python to determine a better CPU number in k8s, then we could have green adjust it's auto cpu count under k8s. If we do go down this path, let's open a new issue to have the discussion in.
You are most welcome! |
We are encountering a high rate of failure with some of our tests after upgrading to Python 3.9.6.
This is failing most of the time under docker, not so much under macOS with the same versions of green (3.2.6) and python (3.9.6).
I'm not sure if the issue is with green itself but the stack trace seems to come from green.
The text was updated successfully, but these errors were encountered: