-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.7 streaming transactions: ArangoError: cluster internal HTTP connection broken #699
Comments
Hi, |
@dothebart what 3 URIs should I use for the Oasis cluster in my PR? |
@pluma can you give a hint for this? |
@dothebart I've added support for passing multiple URLs with commas via ab866b0. Check the changes to Note that this will result in Cluster mode always enables round robin. |
Hi @mikestaub , I hope you are doing well. Alan and WIlli are working on extending the automatic testing to catch these issues more easily. What is the current status? Is it blocking you from moving to 3.7 or did you work around it? best Frank |
@fceller this is blocking me from upgrading to 3.7 but it is not urgent as 3.6 is working well. |
hm, running the tests with 3 coordinators barely doesn't reproduce this. |
@dothebart here is the docker-compose.yml file I am using:
|
this seems to be missing the nginx config file? |
this is the nginx.conf
|
Ok, The instantly working fix is to configure nginx to use tcp-proxy instead of http-proxy by swapping the
We will dig deeper on the real reason later. |
@dothebart any updates on the root cause? I am seeing these errors in Oasis on v3.7.5 as I assume envoy uses TCP not HTTP. |
Hi, |
Hello! Problem with Can you create Oasis issue? Then we will be able to look on your Deployment (we will check internal reason). Best Regards, |
Here is the Oasis issue: https://arangodb.atlassian.net/servicedesk/customer/portal/13/OASIS-418 |
@ajanikow I think the long-term solution is to provide a way to run the Oasis cluster locally so I can run my integration tests against it an be confident it will work once deployed. Either with a docker-compose file or k8 helm charts. |
Ok, the current situation is, that ArangoDB will forward all [most] HTTP-headers that it gets from one coordinator to the one that owns the cursor. However, not forwarding the This bugfix is going to be part of the upcomming 3.7.6 Release. |
@dothebart great, thanks for tracking down this down. In the meantime can I manually remove that header from the requests being sent by arangojs? Do you have an ETA when 3.7.6 will be available on Oasis? |
at least in your testcase this header is added by the NGINX Proxy - as @ajanikow pointed out, using it in TCP-Mode also circumvents the situation from appearing. |
I think it might also be a timing issue in the arangojs task queue. After adding this to my Database config, the issue disappeared:
|
Hi, Happy new Year ;) If its all that, is issue reproducible if you use a cloud VM near to your oasis cluster? |
Happy new year! The error was still appearing in my local env ( not Oasis ), even with TCP routing enabled. You should be able to reproduce it with that docker-compose file. I assume that docker-compose file is a good approximation of an Oasis setup. I saw the same errors when running on lambda connecting to Oasis. I actually think it may be a bug in arangojs as it might be firing the HTTP requests in the wrong order ( commit transaction before it was created ). |
I just confirmed, this issue is still present in 3.7.7 |
we probably can meanwhile close this as duplicate of #702 (comment) since it meanwhile contains a more precise description & testcases of the actual ongoing behaviour WDYT? |
arangojs 7.3.0 changes behavior related to |
I still have to set maxSockets=1 with arangojs@7.5.0 and arangodb@3.7.7 |
Hi, |
I'm closing this due to inactivity. Please follow the directions provided above if the problem still persists. |
This branch was working on 3.6.3, but is now failing on 3.7.3
https://github.com/mikestaub/arangojs/pull/1/files
Would it be possible to include a default Oasis cluster in the integration tests so these types of regressions could be caught earlier?
The text was updated successfully, but these errors were encountered: