-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gathering resource stats took longer than 5 seconds #171
Comments
Probably related with #172. Messages are gone when I switch down the metrics scraping. |
Right now I set a fixed 5 seconds limit for gather requests. You can use |
Same problem here. Also scraping fell off by the timeout (vmagent is set to 10 seconds by default):
I think it's better to implement some non-blocking caching mechanism for the metrics, multiple agents might gather metrics in same time, it is unacceptable that it affects linstor so hard. |
next linstor version will report cached values, that will tremendously speedup If you want you can try these 2 jars, controller only upgrade: |
@rp-, # curl localhost:3370/metrics
[{"ret_code":-4611686018427386906,"message":"Exception thrown.","error_report_ids":["5F3D95EA-00000-000027"]}]
# curl localhost:3370/health
Services not running: NetComService 5F3D95EA-00000-000027.log |
Here is an updated version with nullpointer checks, thanks for testing. |
@rp-, thank you, now metrics collection takes about 2 seconds, but health still not working properly: # curl localhost:3370/health
Services not running: NetComService Please also check ErrorReport-5F3E345B-00000-000000.log just to be sure: # sha256sum /usr/share/linstor-server/lib/controller-1.8.0.jar
65d0b39e75c40c57a5e0bc53498795f2d1e14a9977693298b24b04837b3f350c /usr/share/linstor-server/lib/controller-1.8.0.jar |
are all nodes connected? and does it maybe recover after a while? btw. try also to add |
one node is offline
I don't think so, my health check killing the pod earlier in 5 minutes after startup :)
will do |
with
Any way I still having these errors in my controller.log if I running many requests to
health is working now |
Hi.
controller.log After controller restart nodes become to normal state:
|
does this cluster just have this 11 nodes? |
Hi after upgrade to
v1.8.0
my log is full of these messages:5F3AF821-00000-000000.log
5F3AF821-00000-000001.log
5F3AF821-00000-000003.log
5F3AF821-00000-000002.log
5F3AF821-00000-000004.log
5F3AF821-00000-000005.log
5F3AF821-00000-000006.log
5F3AF821-00000-000007.log
The text was updated successfully, but these errors were encountered: