Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many Bad resource ID errors since Deno 1.27.0 #16450

Closed
jespertheend opened this issue Oct 27, 2022 · 12 comments · Fixed by #16501
Closed

Many Bad resource ID errors since Deno 1.27.0 #16450

jespertheend opened this issue Oct 27, 2022 · 12 comments · Fixed by #16501
Assignees
Labels
bug Something isn't working correctly high priority needs info needs further information to be properly triaged

Comments

@jespertheend
Copy link
Contributor

Apparently I had newly created servers set to automatically download the latest version of Deno (I guess I should have pinned the version), but since 1.27.0 released earlier today websocket connections have been closing abruptly and my logs are full of "Bad resource ID" errors. The errors specifically happen during the WebSocket.send() call.

I'm still investigating what the issue specifically could be, but I'm afraid I'll have to resort to bisecting Deno in production. Did anything change about WebSockets internally?

@bartlomieju bartlomieju added bug Something isn't working correctly high priority labels Oct 27, 2022
@crowlKats
Copy link
Member

There were 3 PRs that changed websocket:
#16004
#16320
#16325

The first one is to prevent errors, so i doubt that one is related.
I am surprised our tests didnt catch any of the "bad rid" errors...

@jespertheend
Copy link
Contributor Author

Thanks! I'll try bisecting those commits tomorrow. I think the issue is only noticeable when there are a lot of connections. (We have about 200 connections per server). That could explain why the tests didn't catch it.

@littledivy
Copy link
Member

@jespertheend I'm not able to reproduce Bad Resource IDs with >200 connections. Can you share releavent code around socket.send? Are you calling it after the socket is ready? (i.e. socket.onopen)

@littledivy
Copy link
Member

My theory is that #16320 's optimization made socket.send fast enough to fail before server socket opens if your app was calling it before socket.onopen.

@littledivy littledivy added the needs info needs further information to be properly triaged label Oct 28, 2022
@littledivy
Copy link
Member

littledivy commented Oct 28, 2022

Actually nvm, there is a readyState !== OPEN check in socket.send. But some reproduction will still be helpful

@jespertheend
Copy link
Contributor Author

36307c4 (#16320) doesn't seem to have the issue. I'll try different commits and report if I find anything.

My send code looks like this:

/**
 * @param {number[] | ArrayBufferLike} data
 */
send(data) {
	if (data instanceof Array) {
		data = (new Uint32Array(data)).buffer;
	}
	try {
		if (this.rawConnection.readyState == WebSocket.OPEN) {
			this.rawConnection.send(data);
		}
	} catch (e) {
		console.error("Failed to send message to connection", e);
	}
}

The clients are sending a bunch of data about 20 times per second. So maybe making 200 connections isn't enough on its own to trigger the errors.
I can't tell for sure but it also seems like cpu usage is a lot higher than with the previous version. So it could also be that something else broke and the bad resource id errors are simply a result of the server being overloaded. Cpu usage seems to be bouncing between 100% and 50%, but it's interesting to note that the bad resource id errors are still happening even when cpu usage is at 50%.

@jespertheend
Copy link
Contributor Author

Seems like it started happening in e3a3095 (bad) there's no errors in 743fcc0 (good)
I don't think the cpu usage has much to do with it because it was using 20% cpu for quite a while just now and I was still getting errors.

@littledivy
Copy link
Member

@jespertheend can you try to reproduce this in canary? I was not able to reproduce the errors so not entirely sure if #16454 fixed it

@jespertheend
Copy link
Contributor Author

Unfortunately that does not seem to fix it, I do notice a difference though. It seems like it no longer throws an error during WebSocket.send(). Previously my logs used to be full of stacktraces pointing to the try catch block around WebSocket.send(). But now it simply logs BadResource: Bad resource ID without a stack trace. Users still lose the connection after about a minute though.

@andreakarasho
Copy link

we are experiencing the same issue: socket just dies/hangs on sending

@jespertheend
Copy link
Contributor Author

I guess this doesn't come as a surprise, but I tried f5cb26a on our production servers and can confirm that it's fixed.

@bartlomieju
Copy link
Member

Good to hear @jespertheend. It will be released today in v1.27.1

DjDeveloperr pushed a commit to DjDeveloperr/deno that referenced this issue Nov 4, 2022
DjDeveloperr pushed a commit to DjDeveloperr/deno that referenced this issue Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly high priority needs info needs further information to be properly triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants