Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very bad performance iterating through a large number of entries #33

Open
chrbala opened this issue Mar 26, 2018 · 3 comments
Open

Very bad performance iterating through a large number of entries #33

chrbala opened this issue Mar 26, 2018 · 3 comments

Comments

@chrbala
Copy link

chrbala commented Mar 26, 2018

Hello – thank you for working on SQLDown!

I'm trying to run the iterator through a large number of entries (~100,000). It is quite slow – I believe it is loading up all the entries before doing anything because it iterates through them quickly once they start up after several (~6) minutes. When testing on a smaller dataset (~3000 entries), I can read them in about 10 seconds total, with the majority of the time spent before any data is returned.

I believe that iter-stream is the bottleneck. It should either iterate one-at-a-time or batch the reads in reasonably-sized batches.

@calvinmetcalf
Copy link
Owner

could I get some more details like

  • what platform?
  • what database?

More generally we are using the built in cursor functionality of the database but it has to do a query in order to start the stream, so if the query is slow then it will take a while to get the first item because on the database it needs to complete the query before it can return anything, some possible issues

mysql doesn't support indexes on arbitrarily long fields so if you are using mysql then unless you set the keySize option there isn't an index on the keys which could be an issue. I also wrote this a while ago and have learned a lot about database performance since then so this could be something in the query that would be fixable.

@chrbala
Copy link
Author

chrbala commented Apr 9, 2018

Sure!

platform: OSX Sierra 10.12.6
database: SQLite

It looks like the code branches don't make indexes for SQLite . Also, the readme specifies keyLength and valueLength, but the actual keys are keySize and valueSize.

Is there a reason why all the databases have three columns if (as the readme says) the column limitation is only with postgres?

@calvinmetcalf
Copy link
Owner

yeah the 3 columns are also useful for dealing with iterators while updates are made, we could use cursors or whatnot but that would be complicated to do for all the different db types, that being said, it does make indexes for SQLite, it makes indexes for every db that isn't mysql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants