New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArangoError: AQL: got invalid vpack value on lookup #3375
Comments
I tried to debug which aql causes this problem and probably it's this one: LET list = (
FOR t IN vTask
FILTER t.status IN ["ready", "run"]
SORT t.primary desc, t.status desc, rand()
RETURN t
)
LET existsPrimaryTasks = LENGTH(FOR t in vTask FILTER t.primary == true && t.status in ['ready', 'run'] RETURN 1)
FOR l IN list
FILTER l.status == 'ready' && (existsPrimaryTasks > 0 ? l.primary == true : 1 == 1)
//Sort ip by /24
LET net = SUBSTRING(l.ip, 0, FIND_LAST(l.ip, '.'))
COLLECT prefix = net into g
LET ips = g[*].t.ip
LET data = UNIQUE(FLATTEN(
FOR idx IN 0..(LENGTH(ips) - 1)
LET ip = ips[idx]
RETURN {ip, idx}
))
FOR d IN data
SORT d.idx
LIMIT 250 //We want send 250 ip max in 1 round
FOR t IN vTask
FILTER t._key == d.ip
RETURN t It has been written by my colleague, ternary operator in filter seems suspicious to me |
So it's crashing also with other commands then the one above. We tried 3.1.27 with no errors. |
when you dump and restore the stage you've got into a new arangodb on another box, can you still reproduce this? |
well when I run query which failed in our code then manually, it's OK. so there's no stage which I can dump and try on another box. but when I run our app (it uses dozens of queries) on different boxes (with clean db) against different use cases then it fails after some time, and so far every time. So it's quite reproducible, but unfortunately I can't just take dump and have query fails. Any ideas how to troubleshoot it? |
can you try to use some loadtesting suite like JMeter to load your application on another system to reproduce this? Compiling arangodb with |
I don't have any experience with JMeter, but our app contains several processes communicating over rabbitmq and to get vpack error it has to run against our live customer network. I compiled arango with backtrace option, started with console, and typed that command. But then I'm little bit loss how to continue. I reproduced error, but was not able to collect backtrace with gdb, as it gives me some error after attaching to arangod: [New LWP 1600] Sorry I don't have much experience with this kind of debugging. Could you please give next steps how to collect backtrace which would be useful for you? |
At first you should see the c++ stacktraces within the javascript stacktraces. As for GDB you could do: |
I don't see any c++ stacktrace within javascript stacktrace 2017-11-01T16:44:30.380Z - error: ArangoError: AQL: got invalid vpack value on lookup (exception location: /usr/local/src/arangodb/arangod/MMFiles/MMFilesCollection.cpp:2959). Please report this error to arangodb.com (while executing) (exception location: /usr/local/src/arangodb/arangod/RestHandler/RestCursorHandler.cpp:130). Please report this error to arangodb.com I compiled arango with -DUSE_BACKTRACE=1 and in console I typed ENABLE_NATIVE_BACKTRACES(true) Problem with GDB break is that app has to to run around 5 minutes until error is thrown and I don't know when exactly this error will occur and that line in code is called all the time. Any other ideas? We are now stuck to 3.1 and we need to move back to 3.2, but we cant' with this error :( |
If you are in a position to compile 3.2 from source (as it seems), then the easiest from my perspective is to apply the following patch to the source code and recompile it: diff --git a/arangod/MMFiles/MMFilesCollection.cpp b/arangod/MMFiles/MMFilesCollection.cpp
index 8902e92433..6e77d1d680 100644
--- a/arangod/MMFiles/MMFilesCollection.cpp
+++ b/arangod/MMFiles/MMFilesCollection.cpp
@@ -2958,6 +2958,7 @@ uint8_t const* MMFilesCollection::lookupRevisionVPack(
TRI_ASSERT(VPackSlice(vpack).isObject());
return vpack;
}
+ TRI_LogBacktrace();
THROW_ARANGO_EXCEPTION_MESSAGE(TRI_ERROR_INTERNAL,
"got invalid vpack value on lookup");
} This will make arangod log a warning every time the problem occurs. The warning will include a backtrace including the call stack.
Each of the addresses contained (e.g.
and if applied to the relevant frames in the call stack, it will show the call stack for the function that triggers the problem. If for any reason an assertion should be triggered, the violated assertion should be logged to ArangoDB's regular log. The binary will also terminate with std::abort, which will produce a coredump if coredumps are enabled. If you could enable coredumps in your system that would also be super-useful because they may provide some further valuable information. If you start arangod manually from the command line, then something like |
Hi Jan, thanks for your help! Here is stacktrace: arangod> ENABLE_NATIVE_BACKTRACES(true) Here are addresses resolution: /usr/local/src/arangodb/lib/Basics/debugging.cpp:286 I run arango from console, used ulimit but can't find coredump. I will try harder to generate/find it. Please let me know if debug above helped or I can do anything else. |
It looks like some of the hash index contents are invalid at the point they are queried from AQL. |
I did not noticed any errors except this problem. I will try to run it against complete new clean database just to be sure. Here is backtrace, hope it will help |
It fails for an AQL query that begins with |
are there any updates/inserts/removes to that collection in the workload that produces the problem? if so, which approximately? |
I send you exact commands to hackers@arangodb.com There are two processes accessing this collection. Both doing inserts and updates pretty intensively. No removes. |
If I am not wrong, we fixed this issue in 3.2.9. |
Hi @Roman1720, could you please check if this error still occurs in recent versions 3.2.9+? |
Yes it's fixed in 3.2.9 |
my environment running ArangoDB
I'm using the latest ArangoDB of the respective release series:
Mode:
Storage-Engine:
On this operating system:
this is an AQL-related issue:
[* ] I'm using graph features
I'm issuing AQL via:
Unfortunately I have only error message and I don't know how to reproduce it:
ArangoError: AQL: got invalid vpack value on lookup (exception location: /var/lib/otherjenkins/workspace/RELEASE__BuildPackages/arangod/MMFiles/MMFilesCollection.cpp:2960). Please report this error to arangodb.com (while executing) (exception location: /var/lib/otherjenkins/workspace/RELEASE__BuildPackages/arangod/RestHandler/RestCursorHandler.cpp:130). Please report this error to arangodb.com
The text was updated successfully, but these errors were encountered: