-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add DEBUG LOAD command #3243
add DEBUG LOAD command #3243
Conversation
also notice that loadDataFromDisk() can do exit(). |
Hello @oranagra, I think DEBUG commands are exactly for this kind of abuses... so looks good to me, but I don't like the name, could we find a name that better provides the idea that the dataset is not flushed? MERGE? MERGELOAD? OVERLOAD? Not sure, maybe somebody has an idea? @itamarhaber @soveran @yossigo @dvirsky? Also typo in the comment (I can fix it after merging but just to remember):
Should be flushing. |
First of all, being able to merge several RDBs into the same instance is a huge feature! maybe worth its own command? I can think of a few scenarios where this would have been helpful. I like MERGELOAD. Or maybe:
|
Yet another thing @oranagra, shouldn't data loading be wrapped in protectClient() and unprotectClient() like in the RELOAD subcommand? See #4804 (that was incidentally found by @dvirsky and later fixed by @soloestoy). |
BTW, @antirez @oranagra , I think the So we should update replication info and disconnect replications to force resync. |
I think the use case of DEBUG LOAD is to totally abuse the server and do strange stuff :-) So if the user is killing herself/himself, no problem 😸 |
This is a very old PR (which i rebased yesterday to resolve a merge conflict with the help message).
@antirez shall i fix this PR (adding NOFLUSH) and protectClient? |
@oranagra Oh ok so there was no actual attempt at avoiding the flush. The problem with not flushing is that is useful only if we then don't crash on duplicated keys, that involves probably touching too much code, maybe we could skip that for now. Now that I understood a bit better the goal, I've the feeling that adding a NOSAVE option to DEBUG RELOAD and, if useful (hardly) in LOADAOF, may be more sensible. This way we basically avoid duplicating the code and adding subcommands. Makes sense? I would squash the commit in that case indeed. |
@antirez sorry for the back and forth (3 year old PR), so i don't remember what i meant. i.e. i did want to let power users load either AOF or RDB without either saving or flushing, and have them use FLUSHDB manually if they want just one. so it is now up to you.
|
@oranagra please check the |
@antirez looks good to me. i'm afraid a module may try to RM_RetainString the key name it gets from RM_GetKeyNameFromIO, so the stack allocated robj may be problematic. i'm worried we may want to add more logic to dbAdd some day and regret that rdb.c calls dictAdd directly, please consider adding flags to dbAdd, or adding a variant like dbAddLoadedKey or alike. i think we should avoid doing direct dictFind and dictAdd in random places in the source tree, and always use dict.c wrappers. |
@oranagra thanks for the solid suggestions, I'm upgrading my branch using your hints, and then re-comment here with what I did to try addressing the issues you raised. |
@oranagra I updated the branch in two ways:
I guess we should also document that modules should not retain keys, even if it will be pretty obvious from the effects: who attempts to do that will se the server crashing immediately during developments. |
@antirez looks like you forgot to use personally i don't think it's necessary to document that issue in RM_RetainString. it's a rare use case, and as you said, it'll be obvious when testing it at development time. |
@oranagra I did it in another commit immediately after, but should already be there in the commits in the branch, maybe I forgot to push it. Checking. |
@oranagra you are right! I forgot to push the last commit, now is there as well. So you look positive about it... merging :-) 25% speedup in loading time is really quite interesting after all, and the new DEBUG RELOAD could be useful. Thank you for the help. |
(closing) |
Related to redis#3243.
hi, as we discussed, adding a DEBUG command like RELOAD that doesn't first save to disk.
can be used by power users to load data without the need to restart the process.
i decided to support AOF too in this function and also concluded it will be more powerful if it won't flush the database (emptyDb) before loading the file (people can use it to load multiple rdb / aof files)
please note however, that FLUSHALL command saves an empty RDB or adds a flush command to the AOF, so if someone wishes to do a combination of FLUSHALL and then DEBUG LOAD, he'll have to put the file he wishes to load in the right location only after the FLUSHALL.
still since this is a debug command i feel making it more raw and powerful is the right thing, if you think otherwise let me know.