Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when running ArangoDB3 on DC/OS under AWS Cloud formation. #1947

Closed
lalitlogical opened this issue Jul 19, 2016 · 23 comments
Closed

Issue when running ArangoDB3 on DC/OS under AWS Cloud formation. #1947

lalitlogical opened this issue Jul 19, 2016 · 23 comments

Comments

@lalitlogical
Copy link

Platform: AWS Cloudformation
version: Arangodb 3.0 on DC/OS in cluster mode
arangojs: ^4.3.0

I have followed the below link to install Arangodb3 on DC/OS.
https://dcos.io/docs/1.7/usage/tutorials/arangodb/

After successful installation, I have used the sshuttle for outside access of Arangodb service. So I am able to use Arangodb services in my Node JS project with arangojs.

But I am getting below error.

DB SERVER ANSWERED WITH ERROR: {"error":true,"errorMessage":"Error message received from shard '' on cluster node 'Coordinator001': initializeCursor lead to an exception (while executing)","code":400,"errorNum":1590}
 (exception location: /usr/src/packages/BUILD/arangod/Aql/ExecutionEngine.cpp:573). Please report this error to arangodb.com (while executing) (exception location: /usr/src/packages/BUILD/arangod/RestHandler/RestCursorHandler.cpp:129). Please report this error to arangodb.com

Is something wrong at my end or this issue is due to arangodb service?

Thanks

@kvahed
Copy link
Contributor

kvahed commented Jul 19, 2016

Hi Lalit,

Could you give me specifics please of the call, which leads to the above error.

Kind regards,
Kaveh

@lalitlogical
Copy link
Author

@kvahed, when I am executing my AQL queries it through above error.

I am using Arangodb cluster service under DC/OS with the help of AWS cloud formation. I had connected my node js project with arangodb service as below.

arangodb/arangodb-dcos#8

@lalitlogical
Copy link
Author

What are the best shards count for a collection when we have 2 coordinators, 2 DB servers, 3 agents?

I have recently created another collection with shards 2. It again fails on AQL queries, which already working on normal arangodb server (without clusters).

@dothebart
Copy link
Contributor

@lalitlogical please give a reproducable example of that incident, else we will not be able to help you.

@lalitlogical
Copy link
Author

@dothebart, may be i find the cause of issue.

It is for a specific query, which contains the custom aql function, which I never added here in installation.

Some time ago I asked for custom function for distance calculation between two coordinate. Later which I have created himself and added to ArangoDb through arangosh. But I forgot to add in this installation. May be it is unavailable during my this query. So this issue arises. Actually some starting queries are running. And this query contains the custom distance calculations function which may causes the issue.

In past i have also forgot on new normal ArangoDb server but raise the issue for undefined geo:gdistance function. But here is the issue is not points towards the undefined function. May be need to improve error message.

But how I add my custom functions here? Where I need to do arangosh?

I will also send you the whole query which you can use to may reproduce the same issue and able to put proper error message. Hope it is due to above :p :)

@lalitlogical
Copy link
Author

@dothebart, Is it possible that we have options on web interface for adding custom aql function like Orientdb provide? It might help normal user to add their custom functions easily without accessing arangosh terminal.

I hope it will available in ArangoDb future releases :) :)

@lalitlogical
Copy link
Author

Hi, anyone can let me know how I can add my Custom AQL Function to arangodb cluster services which contain 2 coordinators, 2 dbservers etc? Where can I able to run arangosh under DC/OS? Thanks

@dothebart
Copy link
Contributor

You deploy the UDF via the coordinator - i.e. using arangosh:

picking the example:

/* /tmp/test.js */
'use strict';

function greeting(name) {
    if (name === undefined) {
        name = "World";
    }
    return `Hello ${name}!`;
}

module.exports = greeting;

Now you load the function file into memory and save it to arangodb:

var aqlfunctions = require("@arangodb/aql/functions");
var func = require("/tmp/test.js");
aqlfunctions.register("HUMAN::GREETING", func, true);

now sending the query will work:

db._query("RETURN HUMAN::GREETING('Spencer')")
[ 
  "Hello Spencer!" 
]

The error thrown when calling a nonexisting function reads ok to me:

db._query("RETURN HUMAN::GGGGREETING('blarg')")
JavaScript exception in file '/local/home/willi/src/devel/js/client/modules/@arangodb/arangosh.js' at 97,7: ArangoError 1582: user function 'HUMAN::GGGGREETING()' not found
stacktrace of offending AQL function: ArangoError: user function 'HUMAN::GGGGREETING()' not found

Since the function lives in the _aqlfunctions collection, its instantly available to new clusters.

@lalitlogical
Copy link
Author

lalitlogical commented Jul 29, 2016

@dothebart Thanks, I have already tried it. But today I got success.

  1. I had done sshuttle to access clusters to my local machine

sshuttle --python /opt/mesosphere/bin/python3.4 -r core@my-ip-address 10.0.0.0/8

Then I am able to access the arangodb service web ui on my browser with local ip address i.e. http://10.0.1.221:14170

  1. Access the remote arangosh as below

arangosh --server.endpoint tcp://10.0.1.221:14170 --server.username root

Then I had registered the distance function as below.

db._useDatabase("GeoConnect");
var aqlfunctions = require("@arangodb/aql/functions");
var f = require("distance.js");
aqlfunctions.register("geo::gdistance", f, true)

Actually, I have tried it before sometimes with no success. But today its works. When I had seen the mesos dashboard, coordinator1, coordinator2, agent3, agent2 had FAILED and started again. So may due to this not worked on that time.

Thanks

@dothebart
Copy link
Contributor

Ok, I guess we can close this issue now then.

@lalitlogical
Copy link
Author

Hi @Simran-B, I have added my custom function as above. I had run my application properly 1/2hours. After that, I am getting the same issue again as below.

{ [ArangoError: Communication with shard '' on cluster node '' failed :  (exception location: /usr/src/packages/BUILD/arangod/Aql/ExecutionEngine.cpp:573). Please report this error to arangodb.com (while executing) (exception location: /usr/src/packages/BUILD/arangod/RestHandler/RestCursorHandler.cpp:129). Please report this error to arangodb.com]
  name: 'ArangoError',
  message: 'Communication with shard \'\' on cluster node \'\' failed :  (exception location: /usr/src/packages/BUILD/arangod/Aql/ExecutionEngine.cpp:573). Please report this error to arangodb.com (while executing) (exception location: /usr/src/packages/BUILD/arangod/RestHandler/RestCursorHandler.cpp:129). Please report this error to arangodb.com',
  errorNum: 4,
  code: 500,
  stack: 'ArangoError: Communication with shard \'\' on cluster node \'\' failed :  (exception location: /usr/src/packages/BUILD/arangod/Aql/ExecutionEngine.cpp:573). Please report this error to arangodb.com (while executing) (exception location: /usr/src/packages/BUILD/arangod/RestHandler/RestCursorHandler.cpp:129). Please report this error to arangodb.com\n    at new ArangoError (/Users/standarduser/Documents/Lalit/CurrentProject/GeoConnect/node_modules/arangojs/lib/error.js:24:15)\n    at /Users/standarduser/Documents/Lalit/CurrentProject/GeoConnect/node_modules/arangojs/lib/connection.js:126:19\n    at callback (/Users/standarduser/Documents/Lalit/CurrentProject/GeoConnect/node_modules/arangojs/lib/util/request.js:90:12)\n    at IncomingMessage.<anonymous> (/Users/standarduser/Documents/Lalit/CurrentProject/GeoConnect/node_modules/arangojs/lib/util/request.js:98:11)\n    at IncomingMessage.emit (events.js:129:20)\n    at _stream_readable.js:908:16\n    at process._tickDomainCallback (node.js:381:11)',
  response: 
   { _readableState: 
      { objectMode: false,
        highWaterMark: 16384,
        buffer: [],
        length: 0,
        pipes: null,
        pipesCount: 0,
        flowing: true,
        ended: true,
        endEmitted: true,
        reading: false,
        sync: true,
        needReadable: false,
        emittedReadable: false,
        readableListening: false,
        defaultEncoding: 'utf8',
        ranOut: false,
        awaitDrain: 0,
        readingMore: false,
        decoder: null,
        encoding: null,
        resumeScheduled: false },
     readable: false,
     domain: null,
     _events: { end: [Object], data: [Function] },
     _maxListeners: undefined,
     socket: 
      { _connecting: false,
        _hadError: false,
        _handle: [Object],
        _host: '10.0.1.218',
        _readableState: [Object],
        readable: true,
        domain: null,
        _events: [Object],
        _maxListeners: undefined,
        _writableState: [Object],
        writable: true,
        allowHalfOpen: false,
        destroyed: false,
        bytesRead: 9111,
        _bytesDispatched: 6908,
        _pendingData: null,
        _pendingEncoding: '',
        parser: null,
        _httpMessage: [Object],
        read: [Function],
        _consuming: true },
     connection: 
      { _connecting: false,
        _hadError: false,
        _handle: [Object],
        _host: '10.0.1.218',
        _readableState: [Object],
        readable: true,
        domain: null,
        _events: [Object],
        _maxListeners: undefined,
        _writableState: [Object],
        writable: true,
        allowHalfOpen: false,
        destroyed: false,
        bytesRead: 9111,
        _bytesDispatched: 6908,
        _pendingData: null,
        _pendingEncoding: '',
        parser: null,
        _httpMessage: [Object],
        read: [Function],
        _consuming: true },
     httpVersionMajor: 1,
     httpVersionMinor: 1,
     httpVersion: '1.1',
     complete: true,
     headers: 
      { server: 'ArangoDB',
        connection: 'Keep-Alive',
        'content-type': 'application/json; charset=utf-8',
        'content-length': '388' },
     rawHeaders: 
      [ 'Server',
        'ArangoDB',
        'Connection',
        'Keep-Alive',
        'Content-Type',
        'application/json; charset=utf-8',
        'Content-Length',
        '388' ],
     trailers: {},
     rawTrailers: [],
     _pendings: [],
     _pendingIndex: 0,
     upgrade: false,
     url: '',
     method: null,
     statusCode: 500,
     statusMessage: 'Internal Server Error',
     client: 
      { _connecting: false,
        _hadError: false,
        _handle: [Object],
        _host: '10.0.1.218',
        _readableState: [Object],
        readable: true,
        domain: null,
        _events: [Object],
        _maxListeners: undefined,
        _writableState: [Object],
        writable: true,
        allowHalfOpen: false,
        destroyed: false,
        bytesRead: 9111,
        _bytesDispatched: 6908,
        _pendingData: null,
        _pendingEncoding: '',
        parser: null,
        _httpMessage: [Object],
        read: [Function],
        _consuming: true },
     _consuming: true,
     _dumped: false,
     req: 
      { domain: null,
        _events: [Object],
        _maxListeners: undefined,
        output: [],
        outputEncodings: [],
        outputCallbacks: [],
        writable: true,
        _last: false,
        chunkedEncoding: false,
        shouldKeepAlive: true,
        useChunkedEncodingByDefault: true,
        sendDate: false,
        _removedHeader: [Object],
        _hasBody: true,
        _trailer: '',
        finished: true,
        _hangupClose: false,
        _headerSent: true,
        socket: [Object],
        connection: [Object],
        _header: 'POST /_db/GeoConnect/_api/cursor HTTP/1.1\r\ncontent-type: application/json\r\ncontent-length: 174\r\nx-arango-version: 20300\r\nHost: 10.0.1.218:1025\r\nAuthorization: Basic YWRtaW46R2UwQ29ubmVjdA==\r\nConnection: keep-alive\r\n\r\n',
        _headers: [Object],
        _headerNames: [Object],
        agent: [Object],
        socketPath: undefined,
        method: 'POST',
        path: '/_db/GeoConnect/_api/cursor',
        parser: null,
        res: [Circular] },
     read: [Function],
     body: 
      { error: true,
        errorMessage: 'Communication with shard \'\' on cluster node \'\' failed :  (exception location: /usr/src/packages/BUILD/arangod/Aql/ExecutionEngine.cpp:573). Please report this error to arangodb.com (while executing) (exception location: /usr/src/packages/BUILD/arangod/RestHandler/RestCursorHandler.cpp:129). Please report this error to arangodb.com',
        code: 500,
        errorNum: 4 },
     rawBody: '{"error":true,"errorMessage":"Communication with shard \'\' on cluster node \'\' failed :  (exception location: /usr/src/packages/BUILD/arangod/Aql/ExecutionEngine.cpp:573). Please report this error to arangodb.com (while executing) (exception location: /usr/src/packages/BUILD/arangod/RestHandler/RestCursorHandler.cpp:129). Please report this error to arangodb.com","code":500,"errorNum":4}' } }

Is this something related to arangojs or my connectivity issue with DC/OS cluster? Is need to raise this issue under arangojs?
Thanks

@dothebart
Copy link
Contributor

Hi @lalitlogical, its not in arangojs - thee is only the messenger of the bad news.

In general it would be good if you could describe more precisely what you're doing to produce this message, maybe even work on my sample above to get it to a state that produces this error.

In which way did you deploy that cluster? are the nodes close to each other? or are there more hops inbetween them?

@dothebart dothebart reopened this Aug 2, 2016
@lalitlogical
Copy link
Author

lalitlogical commented Aug 3, 2016

Hi @dothebart, let me describe the steps which I have done till now. May this help you to reproduce the above issue.

  1. I had followed the below link to setup DC/OS environment on AWS.
    https://dcos.io/docs/1.7/administration/installing/cloud/aws/

I had only chosen the name, keypair (pem) file and remain other preferences same as default. After successful setup of DC/OS, I had installed the DC/OS CLI as mentioned there. From this setup, I have 1 master ec2, 1 public ec2, and 5 slaves ec2. Under DC/OS dashboard (web UI), I have 6 nodes.

  1. From 'Universe' tab, I had installed the Arngodb3 (1.0.2 available that time) with default configuration.

  2. After successful installation, I was able to access the arangodb3 web UI. I had also changed the root password and added a new user with the password for my new database 'GeoConnect'. Where I had created my own database 'GeoConnect' and its collection which required for my project. And

  3. Then I had done sshuttle to connect it with my node js project. So I had done like below.

sudo pip install sshuttle
chmod 600 ~/.ssh/ec2keypairfile.pem
ssh-add ~/.ssh/ec2keypairfile.pem
sshuttle --python /opt/mesosphere/bin/python3.4 -r core@your-ip-address 10.0.0.0/8

Then I am able to access its web UI on the local machine (i.e. http://10.0.1.221:14170) which available under marathon web UI (/service/marathon/ui/) for the arangodb3 application.

  1. Then I had made changes in my ENV variables for pointing to new arangodb service from node js project.
var arango = require("arangojs");
var url = "http://"+process.env.ARANGO_DB_SERVER_USER_NAME+":"+process.env.ARANGO_DB_SERVER_USER_PASSWORD+"@"+process.env.ARANGO_DB_SERVER_HOST+":"+process.env.ARANGO_DB_SERVER_PORT;
var db = arango({
  url: url,
  databaseName: "GeoConnect"
});

As I had a custom AQL function in my AQL queries, I had not added till this step. It slipped from my mind that I had a custom AQL function in my project.

  1. My project had signup feature, which works with my new arangodb cluster services. Then It executes the AQL queries which contain the custom AQL function. Then I had got the most above issue.

Afterward, I had added my custom function as mentioned above. Till now my 1 coordinator and agent got failed started again which I had seen in mesos web UI. Afterward, I was able to run my app with both coordinators in one by one setup. I had added the custom AQL function to only one coordinator. As you mentioned it will automatically sync to another. So I had only done with the single coordinator. So after half an hour I had got above issue on my terminal under node js project.

Hope you got my whole points. Please let me know if required anything else regarding this issue.

@lalitlogical
Copy link
Author

Hi,

I had below architecture current now for DC/OS and arangodb3 services.

Here nodes are on different EC2 instances on AWS which created with cloudformation with the help of DC/OS template of cloudformation.

1st nodes:
arangodb3-Agent2 (Failed and started)
arangodb3-DBServer1

2nd nodes:
arangodb3-Coordinator1 (Failed and started)

3rd nodes:
arangodb3-Agent1 (Failed and started)

4th nodes:
arangodb3-DBServer2

5th nodes:
arangodb3
arangodb3-Agent3
arangodb3-Coordinator2 (Failed and started)

@dothebart
Copy link
Contributor

dothebart commented Aug 3, 2016

Ok, Now whats the actual query you send when you get that error?
Are you able to reduce your usecase to when it occurs?

Can you keep an ssh connection open to the coordinators, and once these trouble appear issue

netstat -aplnt |grep TIME_WAIT |wc -l 

and see whether this returns a number > 50

@lalitlogical
Copy link
Author

lalitlogical commented Aug 3, 2016

This is the first query which runs first after login and had custom aql function.

FOR p IN posts LET tdistance = geo::gdistance(p.latitude, p.longitude, @latitude, @longitude) FILTER p.deleted == false SORT tdistance, p.created_at desc COLLECT dusers = p.user INTO distinctUsers SORT distinctUsers[*].p.created_at desc FOR user in users FILTER user._key == distinctUsers[0].p.user FOR user_info IN user_infos FILTER user_info.user == distinctUsers[0].p.user LET count = LENGTH(distinctUsers), comments = LENGTH(FOR c IN comments FILTER c.post == distinctUsers[0].p._key FOR tu in users FILTER tu._key == c.user RETURN c), blocked_me = LENGTH(FOR bu IN blocked_users FILTER bu.user == user._key and bu.blocked_user == @user LIMIT 1 RETURN bu), blocked_by_me = LENGTH(FOR bu IN blocked_users FILTER bu.user == @user and bu.blocked_user == user._key LIMIT 1 RETURN bu), distance = geo::gdistance(distinctUsers[0].p.latitude, distinctUsers[0].p.longitude, @latitude, @longitude) FILTER blocked_by_me == 0 AND blocked_me == 0 SORT distance LIMIT @offset, @limit RETURN {'rid': distinctUsers[0].p._key, 'count': count, 'text': distinctUsers[0].p.text, 'image': distinctUsers[0].p.image, 'video': distinctUsers[0].p.video, 'thumbnail': distinctUsers[0].p.thumbnail, 'views': distinctUsers[0].p.views, 'comments': comments, 'shares': distinctUsers[0].p.shares, 'latitude': distinctUsers[0].p.latitude, 'longitude': distinctUsers[0].p.longitude, 'distance': distance, 'ssid': distinctUsers[0].p.ssid, 'ip_address': distinctUsers[0].p.ip_address, 'deleted': distinctUsers[0].p.deleted, 'user': user._key, 'first_name': user.first_name, 'last_name': user.last_name, 'user_picture': user.picture, 'user_picture_thumbnail': user.thumbnail, 'status': user_info.status, 'created_at': distinctUsers[0].p.created_at}

Used as below in node js project.

var query = "QUERY AS ABOVE HERE"
db.query(query, {user: req.user.rid, limit: limit, offset: offset, latitude: req.latitude, longitude: req.longitude}).then(function(cursor) {
    var posts = cursor._result;
  }).catch(function (err) {
    // error
  });

@dothebart
Copy link
Contributor

Do I get correctly, that you get the errors you've posted above when issuing this very special query?

Tiny note: you should FILTER before you call geo::distance.

The geo::distance is more or less from those examples I pointed you to a while ago?

So If I get that correctly, you don't use a geo index to only get a list of posts geographically relevant to you (@ latitude, @ longtitude @ offset) ? So basically this will always be a full collection scan?

If you call db._explain() with this query you should see the hops back and forth between coordinators and dbservers with the shards (Scatter/gather). You should make sure the number of hops in the cluster is as low as possible, and at best doesn't go back and forth between coordinator and dbserver.

Since geo::distance does floating point, you should strive not to use it to often on to many datasets.

maybe using the IN list statement with FILTERs can work better for you.

@lalitlogical
Copy link
Author

Hi @dothebart,

Today after approx 1-2 days, I had tried to access again with node js project and try to run our APIs through postman. Strange, it is working fine for above query.

A lot of queries run at the same time in our app. So I can not say surely that above issue occurs only for above query. It might be reachability issue due to improper query as you are mentioning above.

I had connected one of my coordinator with ssh connection run the above command.

ip-10-0-1-221 ~ # netstat -aplnt |grep TIME_WAIT |wc -l 
54

I had run the query with explain. I had below explanation.

Query string:
 FOR p IN posts LET tdistance = geo::gdistance(p.latitude, p.longitude, @latitude, @longitude) FILTER 
 p.deleted == false SORT tdistance, p.created_at desc COLLECT dusers = p.user INTO distinctUsers SORT 
 distinctUsers[*].p.created_at desc FOR user in users FILTER user._key == distinctUsers[0].p.user FOR 
 user_info IN user_infos FILTER user_info.user == distinctUsers[0].p.user LET count = 
 LENGTH(distinctUsers), comments = LENGTH(FOR c IN comments FILTER c.post == distinctUsers[0].p._key 
 FOR tu in users FILTER tu._key == c.user RETURN c), blocked_me = LENGTH(FOR bu IN blocked_users 
 FILTER bu.user == user._key and bu.blocked_user == @user LIMIT 1 RETURN bu), blocked_by_me = 
 LENGTH(FOR bu IN blocked_users FILTER bu.user == @user and bu.blocked_user == user._key LIMIT 1 
 RETURN bu), distance = geo::gdistance(distinctUsers[0].p.latitude, distinctUsers[0].p.longitude, 
 @latitude, @longitude) FILTER blocked_by_me == 0 AND blocked_me == 0 SORT distance LIMIT @offset, 
 @limit RETURN {'rid': distinctUsers[...

Execution plan:
 Id   NodeType                  Site       Est.   Comment
  1   SingletonNode             DBS           1   * ROOT
  2   EnumerateCollectionNode   DBS           3     - FOR p IN posts   /* full collection scan */
  4   CalculationNode           DBS           3       - LET #23 = (p.`deleted` == false)   /* simple expression */   /* collections used: p : posts */
 65   RemoteNode                COOR          3       - REMOTE
 66   GatherNode                COOR          3       - GATHER
  3   CalculationNode           COOR          3       - LET tdistance = GEO::GDISTANCE(p.`latitude`, p.`longitude`, 26.8525084, 80.9486008)   /* user-defined function */   /* v8 expression */   /* collections used: p : posts */
  5   FilterNode                COOR          3       - FILTER #23
  6   CalculationNode           COOR          3       - LET #25 = p.`created_at`   /* attribute expression */   /* collections used: p : posts */
  7   SortNode                  COOR          3       - SORT tdistance ASC, #25 DESC
  8   CalculationNode           COOR          3       - LET #27 = p.`user`   /* attribute expression */   /* collections used: p : posts */
 52   SortNode                  COOR          3       - SORT #27 ASC
  9   CollectNode               COOR          3       - COLLECT dusers = #27 INTO distinctUsers   /* sorted*/
 10   CalculationNode           COOR          3       - LET #29 = distinctUsers[*].`p`.`created_at`   /* simple expression */
 11   SortNode                  COOR          3       - SORT #29 DESC
 59   ScatterNode               COOR          3       - SCATTER
 60   RemoteNode                DBS           3       - REMOTE
 53   IndexNode                 DBS           3       - FOR user IN users   /* primary index scan */
 61   RemoteNode                COOR          3         - REMOTE
 62   GatherNode                COOR          3         - GATHER
 55   ScatterNode               COOR          3         - SCATTER
 56   RemoteNode                DBS           3         - REMOTE
 15   EnumerateCollectionNode   DBS           9         - FOR user_info IN user_infos   /* full collection scan */
 16   CalculationNode           DBS           9           - LET #33 = (user_info.`user` == distinctUsers[0].`p`.`user`)   /* simple expression */   /* collections used: user_info : user_infos */
 17   FilterNode                DBS           9           - FILTER #33
 57   RemoteNode                COOR          9           - REMOTE
 58   GatherNode                COOR          9           - GATHER
 27   SubqueryNode              COOR          9           - LET #11 = ...   /* subquery */
 19   SingletonNode             DBS           1             * ROOT
 20   EnumerateCollectionNode   DBS           0               - FOR c IN comments   /* full collection scan */
 21   CalculationNode           DBS           0                 - LET #35 = (c.`post` == distinctUsers[0].`p`.`_key`)   /* simple expression */   /* collections used: c : comments */
 22   FilterNode                DBS           0                 - FILTER #35
 73   RemoteNode                COOR          0                 - REMOTE
 74   GatherNode                COOR          0                 - GATHER
 67   ScatterNode               COOR          0                 - SCATTER
 68   RemoteNode                DBS           0                 - REMOTE
 54   IndexNode                 DBS           0                 - FOR tu IN users   /* primary index scan */
 69   RemoteNode                COOR          0                   - REMOTE
 70   GatherNode                COOR          0                   - GATHER
 26   ReturnNode                COOR          0                   - RETURN c
 35   SubqueryNode              COOR          9           - LET #15 = ...   /* subquery */
 29   SingletonNode             DBS           1             * ROOT
 30   EnumerateCollectionNode   DBS           0               - FOR bu IN blocked_users   /* full collection scan */
 31   CalculationNode           DBS           0                 - LET #39 = ((bu.`user` == user.`_key`) && (bu.`blocked_user` == 1100953))   /* simple expression */   /* collections used: bu : blocked_users, user : users */
 32   FilterNode                DBS           0                 - FILTER #39
 77   RemoteNode                COOR          0                 - REMOTE
 78   GatherNode                COOR          0                 - GATHER
 33   LimitNode                 COOR          0                 - LIMIT 0, 1
 34   ReturnNode                COOR          0                 - RETURN bu
 43   SubqueryNode              COOR          9           - LET #19 = ...   /* subquery */
 37   SingletonNode             DBS           1             * ROOT
 38   EnumerateCollectionNode   DBS           0               - FOR bu IN blocked_users   /* full collection scan */
 39   CalculationNode           DBS           0                 - LET #41 = ((bu.`user` == 1100953) && (bu.`blocked_user` == user.`_key`))   /* simple expression */   /* collections used: bu : blocked_users, user : users */
 40   FilterNode                DBS           0                 - FILTER #41
 81   RemoteNode                COOR          0                 - REMOTE
 82   GatherNode                COOR          0                 - GATHER
 41   LimitNode                 COOR          0                 - LIMIT 0, 1
 42   ReturnNode                COOR          0                 - RETURN bu
 46   CalculationNode           COOR          9           - LET #43 = ((LENGTH(#19) == 0) && (LENGTH(#15) == 0))   /* simple expression */
 45   CalculationNode           COOR          9           - LET distance = GEO::GDISTANCE(distinctUsers[0].`p`.`latitude`, distinctUsers[0].`p`.`longitude`, 26.8525084, 80.9486008)   /* user-defined function */   /* v8 expression */
 47   FilterNode                COOR          9           - FILTER #43
 48   SortNode                  COOR          9           - SORT distance ASC
 49   LimitNode                 COOR          9           - LIMIT 0, 25
 50   CalculationNode           COOR          9           - LET #45 = { "rid" : distinctUsers[0].`p`.`_key`, "count" : LENGTH(distinctUsers), "text" : distinctUsers[0].`p`.`text`, "image" : distinctUsers[0].`p`.`image`, "video" : distinctUsers[0].`p`.`video`, "thumbnail" : distinctUsers[0].`p`.`thumbnail`, "views" : distinctUsers[0].`p`.`views`, "comments" : LENGTH(#11), "shares" : distinctUsers[0].`p`.`shares`, "latitude" : distinctUsers[0].`p`.`latitude`, "longitude" : distinctUsers[0].`p`.`longitude`, "distance" : distance, "ssid" : distinctUsers[0].`p`.`ssid`, "ip_address" : distinctUsers[0].`p`.`ip_address`, "deleted" : distinctUsers[0].`p`.`deleted`, "user" : user.`_key`, "first_name" : user.`first_name`, "last_name" : user.`last_name`, "user_picture" : user.`picture`, "user_picture_thumbnail" : user.`thumbnail`, ... }   /* simple expression */   /* collections used: user : users */
 51   ReturnNode                COOR          9           - RETURN #45

Indexes used:
 By   Type      Collection   Unique   Sparse   Selectivity   Fields       Ranges
 53   primary   users        true     false       100.00 %   [ `_key` ]   (user.`_key` == distinctUsers[0].`p`.`user`)
 54   primary   users        true     false       100.00 %   [ `_key` ]   (tu.`_key` == c.`user`)

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   remove-unnecessary-calculations
  4   move-calculations-up-2
  5   move-filters-up-2
  6   use-indexes
  7   remove-filter-covered-by-index
  8   move-calculations-down
  9   scatter-in-cluster
 10   distribute-filtercalc-to-cluster
 11   remove-unnecessary-remote-scatter

Is something wrong with it?

@dothebart
Copy link
Contributor

you see each pair of RemoteNode and GatherNode is a connection between Coordinator and DBServer.

EnumerateCollectionNodes are full collection scans without index support in contrary to IndexNodes and thus are to be avoided by using appropriate FILTER statements.

UDFs can currently only be executed on the coordinator hosts.

Have a look at https://docs.arangodb.com/3.0/AQL/ExecutionAndPerformance/Optimizer.html for more details.

@lalitlogical
Copy link
Author

Hi @dothebart,

I had started the app and randomly moving on screens to run any query. It runs fine for two-three minutes and again throwing the same issue.

ip-10-0-1-221 ~ # netstat -aplnt |grep TIME_WAIT |wc -l 
84
ip-10-0-1-221 ~ # netstat -aplnt |grep TIME_WAIT |wc -l 
66
ip-10-0-1-221 ~ # netstat -aplnt |grep TIME_WAIT |wc -l 
177

Numbers moving up down i.e. 74, 63, 66, 138 etc and it had increased to 177.

@lalitlogical
Copy link
Author

Ok, I will go through your link and try to optimise my queries. Might this will help me to resolve this issue.

@lalitlogical
Copy link
Author

Hi @dothebart,

Do you think not optimised queries are cause of this issue or we have not proper architecture for Arangodb3 cluster service under DC/OS? Basically, I need to clear that In which direction I need to work to resolve this issue?

Thanks

@dothebart
Copy link
Contributor

dothebart commented Aug 5, 2016

The error message as you posted it is misleading. We improved the message. This will be part of the next ArangoDB maintenance release.
The current version of the DC/OS framework missed the possibility to change some of the ArangoDB startup parameters. We will add this with the next release of the framework.
User defined functions have some limitations. The documentation might have been not clear enough about that. We have changed the documentation to reflect this.

The query above should take these hints into account. The way it is now it does a full collection scan. This means that the coordinators has to copy all the data from the shards. This has implications:

  • The runtime of the query is too long and will grow non linear with your data
  • Such a query will block one or more V8 contexts for runtime
  • If you have several of those queries running in parallel the server runs out of resources

If you need further help to improve the query, please contact jan.stuecke at arangodb.com for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants