New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible Perfomanceenhancements with loading bigdata #1110
Comments
Sorry, but i feel ignored. I have tried a solution by myself. But when i try to remove the overhead of the queue with a direct return from the buffer the fixed DbValue[]-Array, the performance decreases back. Before the command.Execute is done, the gdsstatement.Clear() is called. I think her the internal buffer must also be resetted, what i didn't find how to do this. It is clear to me, that the perfomance increase is only relevant in case of client and server are on the same machine because the overhead reading from the network slow down always. |
As you discovered usually the network (when not on localhost) is what dominates the time. That said, reducing allocations is never a bad thing. But it needs to be backed by some reasonable scenario - i.e. going for pure speed is one dimension, but the code needs to maintainable, extendable, etc. And fetching 60M records I would say is on the edge. I would rather have "fast" fetching of ~100 rows and "slow" fetching of 60M. Anyway, if you feel like, feel free to create PR and we can discuss the changes there. |
I'm working on some modifications that are compatible and reach also more performance only for sync fetch. |
You can close the issue now. The only thing, which makes between 5 and 10% is to switch the queue to object[], store only the raw values from ReadRow() and in case of Dequeue copy the raw values to a fixed created DbValue[]. The raw object values from the ReadRow() will only be garbaged, when you use single So working with ITypedList instead of typed class objects, is for big Data more effective. I'm currently working for us on an own InMemory-DB for Firebird because the SQL-Overhead to read data from the database can be fastend with cached tables and filter and aggregate within the cache (like Power BI does also). it was a nice foray into your code, please don't be mad at my omissions. |
In the new Version 9.1.1 i found the FetchSize-Property of the FbCommand-Object.
The default is 200 rows.
Without changing i can read from a 3-Columns-Table with 60 million rows without where and order by with an average of 220,000 rows per second all data on I7-Laptop with local 3.0-Database.
With increase the fetchsize to 500,000 i can now fetch the first 2 million rows with 450,000 rows. During further loading the average goes down to finally 320,000 rows per second.
Ok, after Debugging, the max. FetchSize is 16384 which is then always used.
When i exececute the same command a second time, the cache in the fbserver is now loaded, the start value for the 1. million rows grows to 530.000 rows per second. but the finally average is furthermore 320.000.
That is good, but i think why not constant 450,000 or 530.000?
With the profiler i have seen, that during reading millions of DbValue and DbValue[]-Objects are created. So the GarbageCollector must destroy and reorganize the heap which takes a lot of time and slow down the loading process.
The Fetch() of GdsStatement puts all rows in a queue and reads than the rows one by one from the queue.
The FbDataReader can hold only one row for each Read() and the GetValue copies/translate each value to the client.
So it makes no sense, to queue first the fetched rows.
Try to load in the Fetch() rows one by one with an fixed DbValue[]-Row in the ReadRow() and return this for the Reader. So only 1 Row exists during the laoding process.
I have done the same with my own app.
During Read() and GetValues(object[] values) i have created my copy of the rowvalues and have than processed the row. The average loading of all rows was 125,000 rows per second with the same garbage problem.
After using only one values-array the average goes up to 220,000 rows. And the FetchSize an additional load to 320,000 rows.
All is depending on garbage collection.
I have tried the ReadAsync, but this slows down dramatically to 100,000 Rows per second, because of the big overhead and consuming time for create millions of Task's, so this isn't fast enough for any bigdata.
Because of the different handling of ReadAsync() and Read() you may optimize the Read()-Part without influence the ReadAsync(), but i think, the FetchAsync can do the same optimization.
Because the ReadAsync starts already a task, the fetch function must than not additionally async, because the FetchAsync and the server request and ReadRowAsync are also additional creating tasks. I believe It can all be done within the first initiated task from FbDataReader.
When i do the same with a Microsoft SQL Server a have a constant loading of 670,000 Rows per second.
But this isn't a reason to change the database;-).
The 9.1.1-version is a little bit slower, round about 8%, than the version 7.10.1 i used until now.
I'm sorry for my english, but i hope you understand me.
The text was updated successfully, but these errors were encountered: