New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent server error when joining chat room #59
Comments
Are there any server error logs? A timeout such as this usually indicates an uncaught exception on the server. |
Thanks for opening the issue.
Is this what you requested? Please contact us if you need any other information. |
@hdyngd Thanks for the server log. This is definitely part of the problem. We can see here the the server is asking the Chat system to join the chat room, and that the chat room is not responding. Therefore the server does not respond to the client, and you receive the timeout message. When this happens, it is typically the case that there is an earlier error that has some how corrupted the chat room. There is typically a The error that you copied here is part of the symptom. The cause is that either the ChatActor has died, or it is in a bad state. Usually this would be caused by some "original" error. The error would not be displayed each time a user tries to join. It would probably show up just before the first time the error you are seeing comes up. So the error is probably further back in the log. If you can identify the steps to reproduce the behavior, or if you can find another error in the log before this one, it would be very helpful. In the mean time, we are trying to replicate the error. |
@mmacfadden Thanks for the detailed explanation.
[Procedure accompanying this log output]
Is there any information you were looking for in this? |
@hdyngd Thanks for the logs. A few more questions. Is the room you are joining an existing chat room that was previously created? If the the answer to 1 is yes, then if you create a new chat room can you join? Were you able to join the chat room ever? If so did it then break at some point in time? |
@mmacfadden Thanks for the question.
-> No. This chat room was first created.
-> I can join immediately after creation. And it works fine. However, when you try to rejoin, you may be able to join successfully, or you may get a timeout error. It works fine if you can rejoin. If a timeout error occurs, no one can join forever. Operations and data are exactly the same whether they succeed or fail. So I don't know why the timeout error occurs. And this may be unnecessary information, but I also use real time models in addition to chatroom. Do you notice anything about this? |
@hdyngd Thanks for the additional information. We are continuing to look into the issue. A few more questions:
|
We performed the following test: We ran the docker container like this:
Then we created the following script. const chatId = "test-chat-id";
const domainUrl = "ws://localhost:8000/api/realtime/convergence/default";
const displayName = "test user";
const timeout = 2000;
let iterations = 1;
Convergence.connectAnonymously(domainUrl, displayName).then(async (domain) => {
while (true) {
await joinAndLeave(domain);
await new Promise(resolve => setTimeout(resolve, timeout));
}
});
async function joinAndLeave(domain) {
console.log("Iteration " + iterations++);
return domain.chat()
.create({
id: chatId,
type: "room",
membership: "public",
ignoreExistsError: true
})
.then(channelId => {
console.log(`Channel Created: ${channelId}`);
return domain.chat().join(chatId);
})
.then((channel) => {
console.log("Channel Joined");
return channel.leave();
})
.then(() => {
console.log("Channel Left");
});
} We let this run for a couple hundred iterations, and did not see an error. We are wondering if you can run this script on a fresh run of the system and see if it works. If it breaks, can you provide the log from the server side. |
@mmacfadden Thanks for the continuing to look into the issue. I'll answer the question.
-> We using JWT.
-> Yes. When the chat room issue comes up, I can open models, join another chat room, or create a new chat room.
-> Yes. I can see it.
-> I can't delete. An error will be displayed if you try to do so. [Error log when deleting chatroom]
(This is additional information.) |
@mmacfadden We did the same test. const chatId = "test-chat-id";
const domainUrl = "ws://localhost:8000/api/realtime/convergence/default";
const displayName = "test user";
const timeout = 2000;
let iterations = 1;
Convergence.connectAnonymously(domainUrl, displayName).then(async (domain) => {
while (true) {
await joinAndLeave(domain);
await new Promise(resolve => setTimeout(resolve, timeout));
}
});
async function joinAndLeave(domain) {
console.log("Iteration " + iterations++);
return domain.chat()
.create({
id: chatId,
type: "room",
membership: "public",
ignoreExistsError: true
})
.then(channelId => {
console.log(`Channel Created: ${channelId}`);
return domain.chat().join(chatId);
})
.then((channel) => {
console.log("Channel Joined");
return channel.leave();
})
.then(() => {
console.log("Channel Left");
});
} The result was the same, with no errors. But this script is a little different from my implementation. So, I fixed to be closer to my implementation. [fixed script] const chatId = "test-chat-id";
const domainUrl = "ws://localhost:8000/api/realtime/convergence/default";
const displayName = "test user";
const timeout = 2000;
let iterations = 1;
Convergence.connectAnonymously(domainUrl, displayName).then(
async domain => {
const reconnectToken = domain.session().reconnectToken();
domain.dispose();
let i = 0;
while (i <= 300) {
await joinAndLeave(reconnectToken);
await new Promise(resolve => setTimeout(resolve, timeout));
i++;
}
}
);
async function joinAndLeave(reconnectToken) {
console.log("Iteration " + iterations++);
let domain = null;
return Convergence.reconnect(domainUrl, reconnectToken)
.then(dom => {
domain = dom;
return domain.chat().create({
id: chatId,
type: "room",
membership: "public",
ignoreExistsError: true
});
})
.then(channelId => {
console.log(`Channel Created: ${channelId}`);
return domain.chat().join(chatId);
})
.then(() => {
console.log("Channel Joined");
return domain.dispose();
})
.then(() => {
console.log("Channel Left");
});
} This will reproduce it. This is a guess,,, but may be due to not explicitly executing |
@hdyngd thanks. I believe we are getting closer. For some additional details. Each ChatRoom has a persistent ChatActor in the system that manages joining, leaving, sending messages, etc. it starts up when the first message goes to the chat room. After the last person leaves the chat room it will shut down. If that ChatActor somehow gets into a bad state, it is possible that it will then refuse I come requests. If the server is restarted then all ChatActors would shut down and you would likely be able to join again until the issue comes up again. Your experience suggests that this is what is happening. If I can use your script to replicate the issue we should be able to fix the issue quickly. Also, I will check the admin console and see if we have a bug to fix there for listing the chat rooms. If there is I will open another issue. We have an upcoming release very soon that we can get these fixes into with in the next few days if we come to a solution. |
@mmacfadden Thanks for the detailed explanation. This is a report just in case. Timeout error didn't occur when explicitly executing If possible, I'd like you to continue your investigation. |
@hdyngd We were able to reproduce the issue using the script you have provided. It doesn't ALWAYS happen, but it happens enough where we think we can figure out the root cause. We are continuing to work on a fix. I hope to have something figured out in the next few days. This is one of two critical bugs that we are focusing on right now. So this is definitely a priority for us. Thanks for following up. We will post more info back he in the next day or so. |
@hdyngd I think we have made some progress on a fix. We are going to release another version to see if it fixes your problem. It is possible that we have not fixed the issue. But it seems to be working for us. One other question:
Is this in the Admin Console web interface? If so, if you have more that 10 chats, there should be a pagination control at the bottom of the table. Is that not working for you? |
@mmacfadden We thank you a lot for your effort. We will check fix it if another version is released.
This is described in more detail. Here is a screenshot of our admin console. As you can see, only 11 lines are displayed, including 10 lines on page1 and 1 line on page2. However, there are actually more chat rooms. If you delete one chatroom, the hidden chatroom will be newly displayed. No more than 11 lines are displayed. |
We created a separate issue for the admin console bug. #73 |
This should be fixed. |
@mmacfadden We tried 1.0.0-rc.5 and it worked. No timeout error occurs. Thank you very much!! |
@hdyngd Great! |
Versions
client: @convergence/convergence 1.0.0-rc.4
server: convergencelabs/convergence-omnibus
Describe the Bug
When connecting to some chat room, the following error occurs and connecting fails.
Error: A request timeout occurred.
Once an error occurs, the same error will occur 100% from any client, when connecting the chat room. There may be a problem with our implementation, but once this occurs, even if another client can not connect to the same chat room, we suspect that it is a convergence server side problem.
Step To Reproduce
WE are not sure exactly how to reproduce the issue. Here is our example code:
The text was updated successfully, but these errors were encountered: