-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] v3 multi-threading fails due to user cache #44
Comments
@mtuska Thanks for reporting this. I'll try to take a look at this and resolve it. Feel free to open up a PR if you find a solution before I do. |
@mtuska If you are comfortable installing code from source give this branch a try: |
Wow you were quick on this, I'll give it a run through on Monday morning. Additionally, I've seen a strange issue with snmpv3 usmStatsNotInTimeWindows when calling different hosts but I'll open a new issue for that. Need to try and find/build a reproduction of that issue yet. |
After some real world tests(maybe should add artificial sleep time to the threading test) the branch appears to be single threaded as the responses are all 1 at a time from what I'm witnessing. This will query 414 devices(some v2, some v3; v3 creds are shared) with 64 threads. Time with removal of clearing cache in the main branch(main): Time with suggested branch(44-bug-v3-multi-threading-fails-due-to-user-cache) until segmentation fault: Thread 65 "python3" received signal SIGSEGV, Segmentation fault. |
Oh after a bit of investigating I recall I made another fix for this segmentation fault. The issue is from running both v1/v2 and v3 in the same process, as both will call over to __remove_user_from_cache(think the test didn't display this issue, since it should go v1,v2c, then v3. versus real world is more random) this would require additional checking. In my local edits of easysnmp I had added checks for a valid actUser:
Currently rerunning it to double check the fault is removed via this. Will take a while with my devices due to the single-threaded nature of the branch. Edit: Final time for branch with the above edit to avoid the segmentation fault: |
Ahh you're right it does go in order. v1, v2c etc. Let me add a test case that's more "real world". And I'll pull in your version of Edit: Tests have been added to try and mimic "real word" scenarios. The Linux tests all pass, but looks like the MacOS tests are showing the "No Such User" exists error - https://github.com/carlkidcrypto/ezsnmp/actions/runs/8151441707/job/22279325545 Based on previous comments it is suspected to be a SNMPD bug... hmm. |
@mtuska are you seeing the "No Such User" exists errors on certain devices using a certain snmp version like v1, v2, etc ? Would you happen to know what SNMPD version the end devices are using? |
I'm still seeing it act single threaded, and it appears both v2 and v3 are failing for the most part. For my sanity sake I commented out the std::lock_guard and __remove_user_from_cache calls, and afterwards I'm not single threaded anymore and calls are being made successfully. |
Thanks for the update. Are you able to attach the Python script you are using? I want to understand both the issue and use case. It'll take some time. |
I've been using my own wrapper to abstract the actual snmp classes and build a mib helper so I can easily transition between libraries, but I made a shorter version without it for this issue.
|
- this adds integration tests per bug #44 - this adds integrations tests to the tests.yml file as well. - multi proc still having issues with USM errors.
* Create test_multi_threading.py - added a test per what @mtuska posted in the bug. - adding a sleep "fixes" the issue. I think we may have a possible resource contention problem. * Mutex - adding a mutex for the protection of the `__remove_user_from_cache` - updated tests. Timeout errors are normal IMHO since we are hammering the snmp server. * Another Mutex - Fixed another issues that arose with multi-threading. Was seeing inconsistent failures of PDUs. * Flake - fixing flake errors - letting req text file be the source of truth for the "Install pip dependencies" step. * test.yml: Pytest coverage comment step - the "Pytest coverage comment" seems to be broken. Attempting to add the path. * Update tests.yml - debug getting info out of the tests.yml. Step "Create multi-file output listing" may not be working as expected. - quick attempt to fix MacOS tests. All but one test cases fail. * Update tests.yml okay, that made the step for macos fail. reverting change. * MishaKav/pytest-coverage-comment - hmm I think the latest update to this broke stuff. Let me try reverting versions. * OOps - deleted code on accident. Revert that and still use 1.1.50 of pytest-coverage-comment. * /pytest-coverage-comment@v1.1.51 - back to /pytest-coverage-comment@v1.1.51 no change. Turns out I think all tests have to pass in order for coverage to work... * Added More Tests - fixed the null check thanks again @mtuska - added mutli proc tests as well. still not seeing the issues via TDD. * Adding more jobs - added more jobs to tests - revert a move of the null's, valgrind complaints grew... * Update test_multi_proc_thread.py - black . * Bump to version 1.0.0d - okay this is running and passing on my MacOS 14.13.1 using python3. Not sure what's up with the github actions image of macos. * Oops Bad Release Version - fixing the bad release version. Looks like we can use letters anymore. I'll have to document my process. Let's stick to 1.0.0.a0 -aX for alpha builds 1.0.0.b0 - bX for beta builds 1.0;0.c0 - cX for release candidate builds. * Threads - this get's threads to work. The integrations tests for threads all pass. - this get's multi process to mostly work. They occasionally error out with "USM unknown security name (no such user exists)" or "no such name error encountered" * Adding Integration Test - this adds integration tests per bug #44 - this adds integrations tests to the tests.yml file as well. - multi proc still having issues with USM errors.
EzSNMP release version OR commit number
1.0.0c
Operating System and Version
Net-SNMP Library Version
Describe the bug
When using multiple threads with snmpv3, the user cache will be cleared when one of the sessions is ended thus causing the other threads to fail.
To Reproduce
The pytest I've built to replicate this issue:
test_threading.py
When it's 1 worker it'll pass without issue, but the 8/16 workers will fail.
To remediate the issue, I was able to comment out the call to "__remove_user_from_cache" on line 1611 to have the test pass. Probably not the ideal fix, but seems to confirm the issue is related to the user caching.
Expected behavior
SNMPv3 sessions work in multiple threads.
The text was updated successfully, but these errors were encountered: