Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArangoDB3.4.0 Connection reset #7785

Closed
wrrgit opened this issue Dec 18, 2018 · 12 comments
Closed

ArangoDB3.4.0 Connection reset #7785

wrrgit opened this issue Dec 18, 2018 · 12 comments
Labels
1 Analyzing 2 User Abandoned Resolution 3 Java Java Client Driver related

Comments

@wrrgit
Copy link

wrrgit commented Dec 18, 2018

My Environment

  • ArangoDB Version: 3.4.0
  • Storage Engine: RocksDB
  • Deployment Mode: default
  • Deployment Strategy: default
  • Configuration:
  • Infrastructure:
  • Operating System: CentOS 7.6.1810
  • Used Package: Linux version 3.10.0-693.2.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) Changes to the README.md file #1 SMP Tue Sep 12 22:26:13 UTC 2017
    [root@xu arangodb3]# lsb_release -a
    LSB Version: :core-4.1-amd64:core-4.1-noarch
    Distributor ID: CentOS
    Description: CentOS Linux release 7.6.1810 (Core)
    Release: 7.6.1810
    Codename: Core

Component, Query & Data

Affected feature:

AQL query (if applicable):@query("FOR shuoshuo IN t_shuoshuo SORT shuoshuo.date DESC #pageable "
+ " RETURN {"
+ " shuoshuo: shuoshuo,"
+ " likenumber: (FOR shuoshuolike IN t_shuoshuolike"
+ " FILTER shuoshuolike.shuoshuoId == shuoshuo._key"
+ " COLLECT WITH COUNT INTO likenumber RETURN likenumber) ,"
+ " ismyliked: (FOR shuoshuolike IN t_shuoshuolike"
+ " FILTER shuoshuolike.shuoshuoId == shuoshuo._key AND shuoshuolike.account == @account"
+ " COLLECT WITH COUNT INTO ismyliked RETURN ismyliked) ,"
+ " commentnumber: (FOR shuocomment IN t_shuoshuocomment"
+ " FILTER shuocomment.shuoshuoId == shuoshuo._key"
+ " COLLECT WITH COUNT INTO commentnumber RETURN commentnumber), " +
" replynumber: (FOR shuoreply IN t_shuoshuocommentreply"
+ " FILTER shuoreply.shuoshuoId == shuoshuo._key"
+ " COLLECT WITH COUNT INTO replynumber RETURN replynumber) ,"
+ " userinfo: (FOR userinfo IN t_userinfo"
+ " FILTER userinfo.account == shuoshuo.account "
+ " RETURN {avatarurl:userinfo.avatarurl,nickname:userinfo.nickname,online:userinfo.online} )"
+ "}")
Page findAllShuoShuoWithPage(@param("pageable") Pageable pageable,@param("account") long account);

AQL explain (if applicable):

Dataset:

Replication Factor & Number of Shards (Cluster only):

Steps to reproduce

Problem:
nested exception is com.arangodb.ArangoDBException: java.util.concurrent.ExecutionException: com.arangodb.ArangoDBException: java.net.SocketException: Connection reset] with root cause
java.net.SocketException: Connection reset

Expected result:

AQL log: /var/log/arangodb3/arangod.log:
2018-12-20T13:44:31Z [9203] INFO using storage engine rocksdb
2018-12-20T13:44:31Z [9203] INFO {cluster} Starting up with role SINGLE
2018-12-20T13:44:31Z [9203] INFO {syscall} file-descriptors (nofiles) hard limit is 131072, soft limit is 131072
2018-12-20T13:44:31Z [9203] WARNING {threads} --server.threads (64) is more than eight times the number of cores (1), this might overload the server
2018-12-20T13:44:31Z [9203] INFO {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2018-12-20T13:44:31Z [9203] WARNING found existing lockfile '/var/lib/arangodb3/LOCK' of previous process with pid 5869, but that process seems to be dead already
2018-12-20T13:44:32Z [9203] INFO using endpoint 'http+tcp://0.0.0.0:8529' for non-encrypted requests
2018-12-20T13:44:36Z [9203] INFO ArangoDB (version 3.4.0 [linux]) is ready for business. Have fun!
2018-12-21T01:04:45Z [9203] WARNING {engines} slow background settings sync: 2.600928 s
2018-12-21T01:05:00Z [12506] INFO ArangoDB 3.4.0 [linux] 64bit, using jemalloc, build tags/v3.4.0-0-g3a7df19189, VPack 0.1.33, RocksDB 5.16.0, ICU 58.1, V8 5.7.492.77, OpenSSL 1.1.0h  27 Mar 2018
2018-12-21T01:05:00Z [12506] INFO detected operating system: Linux version 3.10.0-693.2.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 22:26:13 UTC 2017
2018-12-21T01:05:00Z [12506] WARNING {memory} /sys/kernel/mm/transparent_hugepage/enabled is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2018-12-21T01:05:00Z [12506] WARNING {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2018-12-21T01:05:00Z [12506] WARNING {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"'
2018-12-21T01:05:00Z [12506] WARNING {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"'
2018-12-21T01:05:00Z [12506] INFO {authentication} Jwt secret not specified, generating...
2018-12-21T01:05:00Z [12506] INFO using storage engine rocksdb
2018-12-21T01:05:00Z [12506] INFO {cluster} Starting up with role SINGLE
2018-12-21T01:05:00Z [12506] INFO {syscall} file-descriptors (nofiles) hard limit is 131072, soft limit is 131072
2018-12-21T01:05:00Z [12506] WARNING {threads} --server.threads (64) is more than eight times the number of cores (1), this might overload the server
2018-12-21T01:05:00Z [12506] INFO {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2018-12-21T01:05:00Z [12506] WARNING found existing lockfile '/var/lib/arangodb3/LOCK' of previous process with pid 9203, but that process seems to be dead already
2018-12-21T01:05:00Z [12506] INFO using endpoint 'http+tcp://0.0.0.0:8529' for non-encrypted requests
2018-12-21T01:05:02Z [12506] INFO ArangoDB (version 3.4.0 [linux]) is ready for business. Have fun!
2018-12-21T12:24:25Z [15624] INFO ArangoDB 3.4.0 [linux] 64bit, using jemalloc, build tags/v3.4.0-0-g3a7df19189, VPack 0.1.33, RocksDB 5.16.0, ICU 58.1, V8 5.7.492.77, OpenSSL 1.1.0h  27 Mar 2018
2018-12-21T12:24:25Z [15624] INFO detected operating system: Linux version 3.10.0-693.2.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 22:26:13 UTC 2017
2018-12-21T12:24:25Z [15624] WARNING {memory} /sys/kernel/mm/transparent_hugepage/enabled is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2018-12-21T12:24:25Z [15624] WARNING {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2018-12-21T12:24:25Z [15624] WARNING {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"'
2018-12-21T12:24:25Z [15624] WARNING {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"'
2018-12-21T12:24:25Z [15624] INFO {authentication} Jwt secret not specified, generating...
2018-12-21T12:24:25Z [15624] INFO using storage engine rocksdb
2018-12-21T12:24:25Z [15624] INFO {cluster} Starting up with role SINGLE
2018-12-21T12:24:25Z [15624] INFO {syscall} file-descriptors (nofiles) hard limit is 131072, soft limit is 131072
2018-12-21T12:24:25Z [15624] WARNING {threads} --server.threads (64) is more than eight times the number of cores (1), this might overload the server
2018-12-21T12:24:25Z [15624] INFO {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2018-12-21T12:24:25Z [15624] WARNING found existing lockfile '/var/lib/arangodb3/LOCK' of previous process with pid 12506, but that process seems to be dead already
2018-12-21T12:24:26Z [15624] INFO using endpoint 'http+tcp://0.0.0.0:8529' for non-encrypted requests
2018-12-21T12:24:30Z [15624] INFO ArangoDB (version 3.4.0 [linux]) is ready for business. Have fun!
2018-12-21T23:50:18Z [18774] INFO ArangoDB 3.4.0 [linux] 64bit, using jemalloc, build tags/v3.4.0-0-g3a7df19189, VPack 0.1.33, RocksDB 5.16.0, ICU 58.1, V8 5.7.492.77, OpenSSL 1.1.0h  27 Mar 2018
2018-12-21T23:50:18Z [18774] INFO detected operating system: Linux version 3.10.0-693.2.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 22:26:13 UTC 2017
2018-12-21T23:50:18Z [18774] WARNING {memory} /sys/kernel/mm/transparent_hugepage/enabled is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2018-12-21T23:50:18Z [18774] WARNING {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2018-12-21T23:50:18Z [18774] WARNING {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"'
2018-12-21T23:50:18Z [18774] WARNING {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"'
2018-12-21T23:50:18Z [18774] INFO {authentication} Jwt secret not specified, generating...
2018-12-21T23:50:18Z [18774] INFO using storage engine rocksdb
2018-12-21T23:50:18Z [18774] INFO {cluster} Starting up with role SINGLE
2018-12-21T23:50:18Z [18774] INFO {syscall} file-descriptors (nofiles) hard limit is 131072, soft limit is 131072
2018-12-21T23:50:18Z [18774] WARNING {threads} --server.threads (64) is more than eight times the number of cores (1), this might overload the server
2018-12-21T23:50:18Z [18774] INFO {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2018-12-21T23:50:18Z [18774] WARNING found existing lockfile '/var/lib/arangodb3/LOCK' of previous process with pid 15624, but that process seems to be dead already
2018-12-21T23:50:19Z [18774] INFO using endpoint 'http+tcp://0.0.0.0:8529' for non-encrypted requests
2018-12-21T23:50:21Z [18774] INFO ArangoDB (version 3.4.0 [linux]) is ready for business. Have fun!


2018-12-22
SpringBoot log:
08:33:28.552 logback [http-nio-8081-exec-10] INFO  c.dowa.controller.ShuoShuoController - getAllShuoShuoByPage account:10001  pageIndex=0   pageSize=20
getAllShuoShuoWithPage: pageIndex:0 pageSize:20
08:33:28.553 logback [pool-60-thread-1] ERROR c.a.i.v.internal.MessageStore - Connection reset
java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:210)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at com.arangodb.internal.velocystream.internal.VstConnection.readBytesIntoBuffer(VstConnection.java:260)
	at com.arangodb.internal.velocystream.internal.VstConnection.readBytes(VstConnection.java:254)
	at com.arangodb.internal.velocystream.internal.VstConnection.readChunk(VstConnection.java:231)
	at com.arangodb.internal.velocystream.internal.VstConnection$1.call(VstConnection.java:141)
	at com.arangodb.internal.velocystream.internal.VstConnection$1.call(VstConnection.java:124)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
08:33:28.554 logback [http-nio-8081-exec-10] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - Servlet.service() for servlet [dispatcherServlet] in context with path [/dowa] threw exception [Request processing failed; nested exception is com.arangodb.ArangoDBException: java.util.concurrent.ExecutionException: com.arangodb.ArangoDBException: java.net.SocketException: Connection reset] with root cause
java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:210)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at com.arangodb.internal.velocystream.internal.VstConnection.readBytesIntoBuffer(VstConnection.java:260)
	at com.arangodb.internal.velocystream.internal.VstConnection.readBytes(VstConnection.java:254)
	at com.arangodb.internal.velocystream.internal.VstConnection.readChunk(VstConnection.java:231)
	at com.arangodb.internal.velocystream.internal.VstConnection$1.call(VstConnection.java:141)
	at com.arangodb.internal.velocystream.internal.VstConnection$1.call(VstConnection.java:124)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@dothebart
Copy link
Contributor

Hi,
can you please have a look what the ArangoDB server did around that time? did it restart and log it?
What was the query in question? I'm afraid that with only this java backtrace we won't find out anything.

@Simran-B
Copy link
Contributor

Please edit your initial post and fill out the template with the ArangoDB version in use etc.

@OmarAyo
Copy link
Contributor

OmarAyo commented Dec 20, 2018

Hi @wqxcloud,

It would be nice if you can fill out the template with as much information as possible.

In addition may you answer dothebart's questions

Thanks

@wrrgit
Copy link
Author

wrrgit commented Dec 22, 2018

Hi,
can you please have a look what the ArangoDB server did around that time? did it restart and log it?
What was the query in question? I'm afraid that with only this java backtrace we won't find out anything.

I`m so sorry for look it now。 i have update it。So much configuration i used the default(do not have to change it)

@wrrgit
Copy link
Author

wrrgit commented Dec 22, 2018

Hi @wqxcloud,

It would be nice if you can fill out the template with as much information as possible.

In addition may you answer dothebart's questions

Thanks

I`m so sorry for look it now。 i have update it。So much configuration i used the default(do not have to change it)

@wrrgit
Copy link
Author

wrrgit commented Dec 22, 2018

Please edit your initial post and fill out the template with the ArangoDB version in use etc.

I`m so sorry for look it now。 i have update it

@Simran-B
Copy link
Contributor

@wqxcloud Did you start with 3.4.0 and a fresh data directory? Then the default engine would be RocksDB. If you upgraded with data from 3.3.x, then it would be MMFiles.

@wrrgit
Copy link
Author

wrrgit commented Dec 27, 2018

@wqxcloud Did you start with 3.4.0 and a fresh data directory? Then the default engine would be RocksDB. If you upgraded with data from 3.3.x, then it would be MMFiles.

I start with 3.4.0 and a fresh data directory.

@wrrgit
Copy link
Author

wrrgit commented Jan 4, 2019

I need your help,Thanks

@jsteemann
Copy link
Contributor

@wqxcloud : I am happy to look into this, but without further information I doubt we can make much progress here.
From the info above I can see that it is a single server using RocksDB and that it restarted (unintentionally) at some point. That restart probably has caused the connection reset. If there was no hardware failure or something alike then this restart is unexpected, and looks like something has crashed the server.

As the ticket does not contain any information about indexes, cardinality etc. of the underlying collections nor about the bind parameters used, I tried setting up the collections locally with some dummy data and used some test bind parameters.

bind = {account:"abc", offset: 10, limit: 1005}; 
q = `FOR shuoshuo IN t_shuoshuo SORT shuoshuo.date DESC LIMIT @offset, @limit RETURN { shuoshuo: shuoshuo, likenumber: (FOR shuoshuolike IN t_shuoshuolike FILTER shuoshuolike.shuoshuoId == shuoshuo._key COLLECT WITH COUNT INTO likenumber RETURN likenumber) , ismyliked: (FOR shuoshuolike IN t_shuoshuolike FILTER shuoshuolike.shuoshuoId == shuoshuo._key AND shuoshuolike.account == @account COLLECT WITH COUNT INTO ismyliked RETURN ismyliked) , commentnumber: (FOR shuocomment IN t_shuoshuocomment FILTER shuocomment.shuoshuoId == shuoshuo._key COLLECT WITH COUNT INTO commentnumber RETURN commentnumber), replynumber: (FOR shuoreply IN t_shuoshuocommentreply FILTER shuoreply.shuoshuoId == shuoshuo._key COLLECT WITH COUNT INTO replynumber RETURN replynumber) , userinfo: (FOR userinfo IN t_userinfo FILTER userinfo.account == shuoshuo.account RETURN {avatarurl:userinfo.avatarurl,nickname:userinfo.nickname,online:userinfo.online} )}`;
db._explain(q, bind); 
db._query(q, bind);

At least locally this does not trigger any problem.

It would there be helpful if you could try running the query in ArangoDB's web UI with a fixed set of bind parameters to see if that triggers the problem. Additionally it would be helpful to see what indexes are available on the underlying collection and how the data is structured.
The easiest thing to get there is to run the query in the web UI and then use the "Create debug package" functionality. This package should contain most of the required information and will likely help us reproduce the problem.
Thanks!

@vinaypyati
Copy link
Contributor

Hi, @wqxcloud,

Just sending a quick follow up. Did you have a chance to run the query in ArangoDB's web UI with a fixed set of parameters as jsteemann suggested? You can generate a debug package by clicking on "Create debug package" button. This package will help us further investigate the issue.

Thanks!

@maxkernbach
Copy link
Contributor

@wqxcloud

I am closing this issue for now since we have not heard back from you in a while.
In case you are still facing this problem, please provide the requested "debug package" and we will reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Analyzing 2 User Abandoned Resolution 3 Java Java Client Driver related
Projects
None yet
Development

No branches or pull requests

7 participants