You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For some reason my vision was to replace the sorting code. I realized this might not be necessary, we have everything we need.
I compared the suggestion I gave in #180 on larger data sets:
creating a keyvi file from scratch, utilizing TPIE
creating x small keyvi files (using the "small data compilers" which don't use TPIE sort) and run merger on it
I ran different cases, in summary the merge approach was roughly 20% slower. Note, I did not optimize anything (I used simple python scripts). My merge approach had to copy more data, an improved implementation would avoid that.
The idea is as follows
create an in-memory sorter
if the in-memory sort buffer hits the threshold, sort the data, create an fsa, persist it, free buffers
go to 1.
after all data has been processed, sort, create, persist the final chunk
merge the fsa's and create the final keyvi file
The text was updated successfully, but these errors were encountered:
#180 made my think.
For some reason my vision was to replace the sorting code. I realized this might not be necessary, we have everything we need.
I compared the suggestion I gave in #180 on larger data sets:
I ran different cases, in summary the merge approach was roughly 20% slower. Note, I did not optimize anything (I used simple python scripts). My merge approach had to copy more data, an improved implementation would avoid that.
The idea is as follows
The text was updated successfully, but these errors were encountered: