Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

query perfomance and points writing order #70

Open
ILENIMARIUS opened this issue Apr 14, 2020 · 6 comments
Open

query perfomance and points writing order #70

ILENIMARIUS opened this issue Apr 14, 2020 · 6 comments

Comments

@ILENIMARIUS
Copy link

Hello,
I am writing 2 points with every "for" iteration and query for points database using time start of write and time start of stop write.
Everything works fine but the query response is realy slow compared to writing.
For instance:
tstart
For ( 1 < 10000 ) {
write -> point { A }
write -> point { B }
}
tstop
then i query all points in time interval -> SELECT * FROM /.*/ WHERE TIME >= (tstart) + " AND TIME < (tstop)

I manage to write (HTTP), 20000 points in total ( A and B ) in 200ms and manage to retrieve them in 1.7s which is almost 10 times slower. This ratio stays no matter how many points.
How can i improve this query time ?
And second:
I write 10000 points of A and 10000 points of B one by one but when using influx command line and select all, the result is sequential, first 10000 (A) points followed by 10000 (B) points with mixed database timestamp.

It should be timeseries:
timestamp1 - A
timestamp2 - B
timestamp3 - A
timestamp4 - B

Why is that?

@awegrzyn
Copy link
Owner

Hi,
Regarding query time, I added a quick check and even CI machine provides result within 20ms.
Regarding ordering it is InfluxDB's default behaviour to return results per series. Are you getting different result when using command line tool?

@ILENIMARIUS
Copy link
Author

Hello,
First, thank you for your message.
The query response is within 20ms but the actual data retrieved in time is the problem.
I checked and i think the bottleneck is somewhee after the data is retrieved from db and a post processing of data is occurring. We have like a data tree (or json type) and in the post processing we retrieve and parse the data.
Example of Raw extracted data with another library {https://github.com/orca-zhang/influxdb-cpp} :
{"results":[{"statement_id":0,"series":[{"name":"Signal1","columns":["time","Data","Timestamp"],"values":[["2020-04-15T06:40.....
I have tested with this library mention above, which is making a query and everything is stored in a string
The perfomance of reading is outstanding but after reading we have to do a post processing of a json like type data which is adding some time. Even with this extra time i am able to read and process much faster that i write with the influxdb-cxx library.
Results:
Influxdb-cxx write:
1000.000 points with the name
.field timestamp -> chrono timestamp
.field data -> 23 char string
write: duration - 9849 miliseconds

influxdb-cpp ->
read all from all time:
DATA REQUEST: duration - 6103 miliseconds
extract all values and put them in a vector:
"values":[["2020-04-15T06:40:44.818300968Z","FF:4D:30:30:0F:0D:0D:FF","1585559900000000000"]
DATA PROCESSING: duration - 2843 miliseconds
VECTOR SIZE: 3000000 (each point has 3 values : 1.timestamp database 2.chrono timestamp 3.data string)
Can we optimize something to increase the reading of influxdb-cxx?

@ILENIMARIUS
Copy link
Author

Hello Again.
Looking through code i found that a bottleneck is in
/master/src/InfluxDB.cxx from line 90 to 119.
For instance using stringstream is time consuming.
Some of the series parsing there can be optimized.
I will let you know if i have more details.

@awegrzyn
Copy link
Owner

Indeed this should be definitely optimised...

@awegrzyn awegrzyn added this to the Release v0.6.0 milestone Apr 18, 2020
@awegrzyn
Copy link
Owner

Actually I did check performance and I got:

  • 10k write: 116ms
  • 10k read: 195ms

@ILENIMARIUS
Copy link
Author

ILENIMARIUS commented Apr 21, 2020

Hello Awegrzyn,
Because is not linear, Try like this:
10k batches.
write: the same point for 1000.000 times with 2-3 fields as fast as possible.
read: select* from "db" from start of writing time till end.
write/read ratio is somewhere between 1/5-1/9
For instance I write 1mill points in aprox. 30s
and i go in timeout trying to read them.
Eliminating timout, read takes aprox 289s

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants