Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to turn off stacktrace bookkeeping #96

Closed
wants to merge 1 commit into from
Closed

Allow to turn off stacktrace bookkeeping #96

wants to merge 1 commit into from

Conversation

yonromai
Copy link

Hi,

The stack = traceback.extract_stack()[:-1] instruction makes inserts too slow for my use case.

This PR adds a flag to disable this behavior if needed.

According to some basic testing, it speeds up inserts by ~20x:

Test code:

from sqlitedict import SqliteDict

def insert(outer_stack):
    d = SqliteDict(outer_stack=outer_stack)
    for i in range(10000):
        d["key_{}".format(i)] = "value_{}".format(i)
    d.close()
%%timeit
insert(outer_stack=True)

=> 2.68 s ± 42.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
insert(outer_stack=False)

=> 121 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

sqlitedict.py Outdated Show resolved Hide resolved
sqlitedict.py Outdated Show resolved Hide resolved
sqlitedict.py Outdated Show resolved Hide resolved
@mpenkov mpenkov requested a review from piskvorky April 25, 2019 06:34
@piskvorky
Copy link
Owner

piskvorky commented Apr 25, 2019

makes inserts too slow for my use case

Can you post some benchmarks? before / after your PR.

Intuitively, I'd expect debugging features that significantly slow down normal execution to be an opt-in, not opt-out. But if the difference is "slight", as per PR #28, it's OK as opt-out for power-users. So the benchmark numbers matter. CC @jquast .

@yonromai
Copy link
Author

@mpenkov Thanks for the review - Amended the commit based on your comments, PTAL

@yonromai
Copy link
Author

@piskvorky Although I cannot share my code (not open source), I observed a similar improvement as shown above (about ~23x faster after than before, although I didn't compute the variance in the real use case, as opposed to above).

FWIW I am evaluating this map compared to a leveldb index and sqlitedict is still ~4x slower than leveldb in my use case.

@piskvorky
Copy link
Owner

Yeah, that wouldn't surprise me. Sqlitedict is aiming to be a simple-to-use wrapper for SQLite, speed is not one of our primary objectives.

@yonromai
Copy link
Author

@piskvorky LMK if you think I should make this the default behavior

@piskvorky
Copy link
Owner

@yonromai I'm not very familiar with this part of the code. Can you summarize the pros/cons?

What would we (the users, the developers) gain / lose by making it the default?

@yonromai
Copy link
Author

Pro: Inserts are 23x faster.
Con: If an error occurs, the stack trace will mostly get swallowed, and 'No outer stack to display. Enable it using outer_stack=True' will be displayed in place of the stacktrace.

@piskvorky
Copy link
Owner

piskvorky commented Apr 26, 2019

I'm not sure. I'm leaning slightly toward more intelligible messages… but damn, 23x is a lot! How often / under what conditions can these errors happen? In our code, in user code…?

@mpenkov WDYT?

'No outer stack to display. Enable it using outer_stack=True' will be displayed

That message is too vague: where should I (as a user) enter this outer_stack=True? Can you please make it more specific and actionable.

@yonromai yonromai closed this Jun 3, 2019
@mpenkov
Copy link
Collaborator

mpenkov commented Jun 5, 2019

@yonromai why did you close this PR?

@yonromai
Copy link
Author

yonromai commented Jun 5, 2019

@mpenkov I closed the PR because it's been stale and I'm trying to keep my github.com/pulls clean since I use it a lot.

I kept the fork alive in case you want to cherry pick the changes.

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 5, 2019

Are you able to complete the PR? You received feedback from @piskvorky. From what I can see:

  1. Document the use case (most importantly, source of exceptions)
  2. Improve error message

@yonromai
Copy link
Author

yonromai commented Jun 5, 2019

@mpenkov Sorry but I don't have much more bandwidth to dedicate to this (we decided to not use sqlitedict).

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 6, 2019

OK, that's fair enough. Thanks for the changes in your PR. We'll take it from here.

@purplesyringa
Copy link

Has there been any progress on this PR? traceback.extract_stack() stat(2)'s all Python source files in the traceback--making more syscalls than sqlite itself. No wonder this is so slow.

@mpenkov
Copy link
Collaborator

mpenkov commented Feb 22, 2022

@imachug No. Are you interested in fixing this?

@purplesyringa
Copy link

I thought this PR is more or less ready to merge, except maybe

That message is too vague: where should I (as a user) enter this outer_stack=True? Can you please make it more specific and actionable.

Do you have any other issues with this?

@mpenkov
Copy link
Collaborator

mpenkov commented Feb 22, 2022

No, TBH, I think that's the only thing left. Let me go clean up the error message and we can merge this.

@mpenkov mpenkov self-assigned this Feb 22, 2022
@mpenkov mpenkov added this to the 1.8.0 milestone Feb 22, 2022
@@ -140,6 +141,8 @@ def __init__(self, filename=None, tablename='unnamed', flag='c',
object.
The default is to use pickle.

If you disable `outer_stack`, the stacktrace at insert time won't be saved. This
Copy link
Owner

@piskvorky piskvorky Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context / info please. As a user, what is "the stacktrace at insert time", what are the implications of "not saving" it? Why would I want / not want that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants