Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU Node 100% RC Version 6.2.2 #29515

Open
nileshA-addweb opened this issue Jun 12, 2023 · 7 comments
Open

High CPU Node 100% RC Version 6.2.2 #29515

nileshA-addweb opened this issue Jun 12, 2023 · 7 comments

Comments

@nileshA-addweb
Copy link

nileshA-addweb commented Jun 12, 2023

Description:

Rocketchat automatically stop working and node reach to 100% CPU due to which Mongo and Node stop processing further requests and making down RC. It needs to restart containers to bring RC up and running

Steps to reproduce OR Actual behavior:

If the rocket stops working, and the node is at 100% then need to do docker-compose down and docker-compose up to bring RC with normal status

Expected behavior:

rocket stops working

Server Setup Information:

Version of Rocket.Chat Server: 6.2.2
Operating System: Ubuntu 20.04.6 LTS
Deployment Method: Docker
Number of Running Instances: 1
DB Replicaset Oplog: Yes
NodeJS Version: 14.21.2 - x64
MongoDB Version: 5.0.15
MongoDB Engine: wiredTiger
USE_NATIVE_OPLOG=true

Client Setup Information

Desktop App: App 3.8.13
Operating System: Ubuntu 22.04.2 LTS

Relevant logs:

{"t":{"$date":"2023-06-12T08:18:56.880+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn17","msg":"Slow query","attr":{"type":"command","ns":"rocketchat.rocketchat_message","command":{"aggregate":"rocketchat_message","pipeline":[{"$match":{"t":"omnichannel_placed_chat_on_hold"}},{"$group":{"_id":"$rid"}},{"$group":{"_id":null,"total":{"$sum":1}}}],"cursor":{},"lsid":{"id":{"$uuid":"e57cbd99-4ba4-4b93-8462-9dad3c7055fc"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1686557927,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"rocketchat","$readPreference":{"mode":"secondaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1499728,"cursorExhausted":true,"numYields":1599,"nreturned":0,"queryHash":"48B4E645","planCacheKey":"2C688F3D","reslen":243,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1601}},"Global":{"acquireCount":{"r":1601}},"Mutex":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":117572249,"timeReadingMicros":562075}},"remote":"172.26.0.3:36464","protocol":"op_msg","durationMillis":6921}}

{"t":{"$date":"2023-06-12T08:18:56.693+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn23","msg":"Slow query","attr":{"type":"command","ns":"rocketchat.rocketchat_message","command":{"aggregate":"rocketchat_message","pipeline":[{"$match":{"t":"voip-call-on-hold"}},{"$group":{"_id":"$rid"}},{"$group":{"_id":null,"total":{"$sum":1}}}],"cursor":{},"lsid":{"id":{"$uuid":"eaee148b-16ad-405a-ab5e-b4d25750a679"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1686557927,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"rocketchat","$readPreference":{"mode":"secondaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1499728,"cursorExhausted":true,"numYields":1592,"nreturned":0,"queryHash":"48B4E645","planCacheKey":"2C688F3D","reslen":243,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1594}},"Global":{"acquireCount":{"r":1594}},"Mutex":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":90880247,"timeReadingMicros":490903}},"remote":"172.26.0.3:56300","protocol":"op_msg","durationMillis":6728}}

{"t":{"$date":"2023-06-12T08:18:51.412+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn18","msg":"Slow query","attr":{"type":"command","ns":"rocketchat.rocketchat_uploads","command":{"aggregate":"rocketchat_uploads","pipeline":[{"$group":{"_id":"total","total":{"$sum":"$size"}}}],"cursor":{},"lsid":{"id":{"$uuid":"6869a1c6-8c34-4935-a74d-f7227462cc20"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1686557930,"i":5}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"rocketchat","$readPreference":{"mode":"secondaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":46476,"cursorExhausted":true,"numYields":59,"nreturned":1,"reslen":281,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":62}},"Global":{"acquireCount":{"r":62}},"Mutex":{"acquireCount":{"r":3}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":31654553,"timeReadingMicros":398635}},"remote":"172.26.0.3:36480","protocol":"op_msg","durationMillis":901}}

{"t":{"$date":"2023-06-12T08:18:50.984+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn28","msg":"Slow query","attr":{"type":"command","ns":"rocketchat.rocketchat_sessions","command":{"insert":"rocketchat_sessions","documents":[{"_id":"6486d4ea14052d142da7f2b3","type":"session","sessionId":"YseHbbzcRHGXG33zq","instanceId":"325c183f-ac7f-4373-8ee7-bf55544f177a","loginToken":"1RgtANdc+7ZFd6QV6VLaFl6KFBwq7HydQlhaXrJcC1M=","ip":"223.177.186.63","host":"chat.addwebsolution.in","device":{"type":"desktop-app","name":"Rocket.Chat","longVersion":"3.9.3","os":{"name":"Windows","version":"10"},"version":"3.9.3"},"userId":"AGuLQYrYHQKB7E52q","roles":["user","HR"],"mostImportantRole":"custom-role","loginAt":{"$date":"2023-06-12T08:18:50.779Z"},"day":12,"month":6,"year":2023,"searchTerm":"Rocket.Chatdesktop-appWindowsYseHbbzcRHGXG33zqAGuLQYrYHQKB7E52q","createdAt":{"$date":"2023-06-12T08:18:50.810Z"},"_updatedAt":{"$date":"2023-06-12T08:18:50.811Z"}}],"ordered":true,"lsid":{"id":{"$uuid":"834cfe99-4a48-4d8d-9cc0-c2dc07a29196"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1686557930,"i":6}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"rocketchat"},"ninserted":1,"keysInserted":15,"numYields":0,"reslen":230,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"FeatureCompatibilityVersion":{"acquireCount":{"r":1,"w":1}},"ReplicationStateTransition":{"acquireCount":{"w":2}},"Global":{"acquireCount":{"r":1,"w":1}},"Database":{"acquireCount":{"w":1}},"Collection":{"acquireCount":{"w":1}},"Mutex":{"acquireCount":{"r":1}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":2},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":176417,"timeReadingMicros":593},"timeWaitingMicros":{"schemaLock":17497}},"remote":"172.26.0.3:56356","protocol":"op_msg","durationMillis":163}}

@Gummikavalier
Copy link

If you have OTR enabled and people use it, this bug could be the cause:
#28918

@nileshA-addweb
Copy link
Author

We are still facing this issue, I have shared mongo logs to see if any findings are there to implement any working solution to overcome this daily issue.

@nileshA-addweb
Copy link
Author

Can anyone help here to overcome this issue which is happening anytime on random basis?

@nileshA-addweb
Copy link
Author

We are using below RC version with Docker Image and getting constant HIGH CPU usage by NODE due to which RC become unavailable and there are no such slow queries or other logs in MongoDB which cause this issue. Can anyone help look into this issue and provide solution.

Rocket.Chat Version: 6.2.9
NodeJS Version: 14.21.3 - x64
MongoDB Version: 5.0.18
MongoDB Engine: wiredTiger
Platform: linux
Process Port: 3000
Site URL:
ReplicaSet OpLog: Enabled
Commit Hash: abf746733b
Commit Branch: HEAD

@shiryov
Copy link

shiryov commented Aug 9, 2023

Still issue in 6.2.10, 6.3.0
We do not use omnichannel, it is disabled in the settings.
image
But every time at the start of each instance, this really long query is launched. There are 50M messages in our database, the query takes about a minute. Also, the query is launched during normal operation during the day and reduces server performance.
slow_otr_op.txt

@shiryov
Copy link

shiryov commented Aug 11, 2023

@nileshA-addweb this helps:

use <db_name>
db.rocketchat_message.createIndex({ t: 1 }, { sparse: true })

@nileshA-addweb
Copy link
Author

nileshA-addweb commented Aug 11, 2023

@shiryov
Will this affect the notifications which are appearing while someone tag us in particular channel in RC? If there is no impact on notifications and other functionalities then we can check once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants