Replies: 3 comments 60 replies
-
mystery solved: we had a |
Beta Was this translation helpful? Give feedback.
60 replies
-
I've uploaded the CPU flame from before and after the CPU spike. |
Beta Was this translation helpful? Give feedback.
0 replies
-
If I cannot find a solution for this problem, unfortunately, I'll have to replace |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been using Janusgraph in production for fraud prevention scenarios, but recently we have been experiencing an ingestion throughput degradation. Our backend is Cassandra/Elasticsearch and below are the
current
ingestion statistics:35820206 [metrics-logger-reporter-thread-1] INFO org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics - type=TIMER, name=org.apache.tinkerpop.gremlin.server.GremlinServer.op.eval, count=83902, min=346.476045, max=23306.430972, mean=5669.135124586575, stddev=4673.183770559961, median=4345.621524499999, p75=8064.485307999999, p95=15156.586356099986, p98=17926.15823546, p99=18811.370566920003, p999=23280.965525466003, mean_rate=2.343088469134227, m1=1.7210005889641737, m5=1.786056502065271, m15=1.9371755852213393, rate_unit=events/second, duration_unit=milliseconds
In this case, we are ingesting lots of
payments
in anup-insert
fashion. Our Janusgraph instance has32GB of memory and 8 cores
with the following Gremlin/JVM settings:I've noticed that when I
pause
the ingestion (payments
worker) for a while, the ingestion throughput goes up, temporarily, after I restart theworker
and the amount of time withhigh
throughput is directly correlated to the time theworker
ispaused/turned-off
. My first hypothesis is that this is related to theID allocation process
(ids.block-size/ids.authority.wait-time), but I'm not 100% sure.Cassandra throughput after a few
worker
restarts with different timeframesDoes anyone that understands Janusgraph implementation can help me clarify what's going on?
See bellow some stats from the
Janusgraph
instance.Stats while the ingestion is at high throughput
Stats while the ingestion is at low throughput (few minutes after)
Beta Was this translation helpful? Give feedback.
All reactions