query perfomance and points writing order #70

ILENIMARIUS · 2020-04-14T09:28:52Z

Hello,
I am writing 2 points with every "for" iteration and query for points database using time start of write and time start of stop write.
Everything works fine but the query response is realy slow compared to writing.
For instance:
tstart
For ( 1 < 10000 ) {
write -> point { A }
write -> point { B }
}
tstop
then i query all points in time interval -> SELECT * FROM /.*/ WHERE TIME >= (tstart) + " AND TIME < (tstop)

I manage to write (HTTP), 20000 points in total ( A and B ) in 200ms and manage to retrieve them in 1.7s which is almost 10 times slower. This ratio stays no matter how many points.
How can i improve this query time ?
And second:
I write 10000 points of A and 10000 points of B one by one but when using influx command line and select all, the result is sequential, first 10000 (A) points followed by 10000 (B) points with mixed database timestamp.

It should be timeseries:
timestamp1 - A
timestamp2 - B
timestamp3 - A
timestamp4 - B

Why is that?

awegrzyn · 2020-04-14T21:02:19Z

Hi,
Regarding query time, I added a quick check and even CI machine provides result within 20ms.
Regarding ordering it is InfluxDB's default behaviour to return results per series. Are you getting different result when using command line tool?

ILENIMARIUS · 2020-04-15T12:26:16Z

Hello,
First, thank you for your message.
The query response is within 20ms but the actual data retrieved in time is the problem.
I checked and i think the bottleneck is somewhee after the data is retrieved from db and a post processing of data is occurring. We have like a data tree (or json type) and in the post processing we retrieve and parse the data.
Example of Raw extracted data with another library {https://github.com/orca-zhang/influxdb-cpp} :
{"results":[{"statement_id":0,"series":[{"name":"Signal1","columns":["time","Data","Timestamp"],"values":[["2020-04-15T06:40.....
I have tested with this library mention above, which is making a query and everything is stored in a string
The perfomance of reading is outstanding but after reading we have to do a post processing of a json like type data which is adding some time. Even with this extra time i am able to read and process much faster that i write with the influxdb-cxx library.
Results:
Influxdb-cxx write:
1000.000 points with the name
.field timestamp -> chrono timestamp
.field data -> 23 char string
write: duration - 9849 miliseconds

influxdb-cpp ->
read all from all time:
DATA REQUEST: duration - 6103 miliseconds
extract all values and put them in a vector:
"values":[["2020-04-15T06:40:44.818300968Z","FF:4D:30:30:0F:0D:0D:FF","1585559900000000000"]
DATA PROCESSING: duration - 2843 miliseconds
VECTOR SIZE: 3000000 (each point has 3 values : 1.timestamp database 2.chrono timestamp 3.data string)
Can we optimize something to increase the reading of influxdb-cxx?

ILENIMARIUS · 2020-04-16T18:51:16Z

Hello Again.
Looking through code i found that a bottleneck is in
/master/src/InfluxDB.cxx from line 90 to 119.
For instance using stringstream is time consuming.
Some of the series parsing there can be optimized.
I will let you know if i have more details.

awegrzyn · 2020-04-18T08:35:31Z

Indeed this should be definitely optimised...

awegrzyn · 2020-04-18T09:59:56Z

Actually I did check performance and I got:

10k write: 116ms
10k read: 195ms

ILENIMARIUS · 2020-04-21T06:13:23Z

Hello Awegrzyn,
Because is not linear, Try like this:
10k batches.
write: the same point for 1000.000 times with 2-3 fields as fast as possible.
read: select* from "db" from start of writing time till end.
write/read ratio is somewhere between 1/5-1/9
For instance I write 1mill points in aprox. 30s
and i go in timeout trying to read them.
Eliminating timout, read takes aprox 289s

awegrzyn mentioned this issue Apr 14, 2020

Verify in CI that simple query returns within 20ms #71

Merged

awegrzyn added this to the Release v0.6.0 milestone Apr 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query perfomance and points writing order #70

query perfomance and points writing order #70

ILENIMARIUS commented Apr 14, 2020

awegrzyn commented Apr 14, 2020

ILENIMARIUS commented Apr 15, 2020

ILENIMARIUS commented Apr 16, 2020

awegrzyn commented Apr 18, 2020

awegrzyn commented Apr 18, 2020

ILENIMARIUS commented Apr 21, 2020 •

edited

Loading

query perfomance and points writing order #70

query perfomance and points writing order #70

Comments

ILENIMARIUS commented Apr 14, 2020

awegrzyn commented Apr 14, 2020

ILENIMARIUS commented Apr 15, 2020

ILENIMARIUS commented Apr 16, 2020

awegrzyn commented Apr 18, 2020

awegrzyn commented Apr 18, 2020

ILENIMARIUS commented Apr 21, 2020 • edited Loading

ILENIMARIUS commented Apr 21, 2020 •

edited

Loading