Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add DEBUG LOAD command #3243

Closed
wants to merge 2 commits into from
Closed

add DEBUG LOAD command #3243

wants to merge 2 commits into from

Conversation

oranagra
Copy link
Member

hi, as we discussed, adding a DEBUG command like RELOAD that doesn't first save to disk.
can be used by power users to load data without the need to restart the process.

i decided to support AOF too in this function and also concluded it will be more powerful if it won't flush the database (emptyDb) before loading the file (people can use it to load multiple rdb / aof files)

please note however, that FLUSHALL command saves an empty RDB or adds a flush command to the AOF, so if someone wishes to do a combination of FLUSHALL and then DEBUG LOAD, he'll have to put the file he wishes to load in the right location only after the FLUSHALL.
still since this is a debug command i feel making it more raw and powerful is the right thing, if you think otherwise let me know.

@oranagra
Copy link
Member Author

also notice that loadDataFromDisk() can do exit().
i rather not do that in DEBUG, but decided it is better to utilize shared code with the startup process.

@antirez
Copy link
Contributor

antirez commented Mar 21, 2019

Hello @oranagra, I think DEBUG commands are exactly for this kind of abuses... so looks good to me, but I don't like the name, could we find a name that better provides the idea that the dataset is not flushed? MERGE? MERGELOAD? OVERLOAD? Not sure, maybe somebody has an idea? @itamarhaber @soveran @yossigo @dvirsky?

Also typo in the comment (I can fix it after merging but just to remember):

(without flashing)

Should be flushing.

@dvirsky
Copy link
Contributor

dvirsky commented Mar 21, 2019

First of all, being able to merge several RDBs into the same instance is a huge feature! maybe worth its own command? I can think of a few scenarios where this would have been helpful.

I like MERGELOAD.

Or maybe:

DEBUG LOAD [MERGE | NOMERGE] or something like that?

@antirez
Copy link
Contributor

antirez commented Mar 21, 2019

@dvirsky A MERGE option for LOAD looks an interesting approach indeed.

@oranagra I don't understand why there is an emptyDb(-1) call in the new code if the LOAD documentation states that there is no flushing.

@antirez
Copy link
Contributor

antirez commented Mar 21, 2019

Yet another thing @oranagra, shouldn't data loading be wrapped in protectClient() and unprotectClient() like in the RELOAD subcommand? See #4804 (that was incidentally found by @dvirsky and later fixed by @soloestoy).

@soloestoy
Copy link
Collaborator

BTW, @antirez @oranagra , I think the LOAD subcommand should only run in master mode, because we can't make sure if the RDB file is generated by this instance itself, even if it is, the RDB file may have a gap of the data in memory.

So we should update replication info and disconnect replications to force resync.

@antirez
Copy link
Contributor

antirez commented Mar 21, 2019

BTW, @antirez @oranagra , I think the LOAD subcommand should only run in master mode, because we can't make sure if the RDB file is generated by this instance itself, even if it is, the RDB file may have a gap of the data in memory.

So we should update replication info and disconnect replications to force resync.

I think the use case of DEBUG LOAD is to totally abuse the server and do strange stuff :-) So if the user is killing herself/himself, no problem 😸

@oranagra
Copy link
Member Author

This is a very old PR (which i rebased yesterday to resolve a merge conflict with the help message).

  1. indeed calls to protectClient() and unprotectClient() are missing (they did not exist when i created the PR).
  2. i think that in the help message i meant to say "without saving" and not "without flushing", otherwise why would i leave the call to emptyDb()?
  3. it would indeed be nice to add a flag for NOFLUSH
  4. i think that being a DEBUG command, we don't need to handle slaves differently. user that calls it should know what he's doing.

@antirez shall i fix this PR (adding NOFLUSH) and protectClient?
if so, should i do it in an additional commit? or squash them into one?
(github will still let you see the diff that the new push adds, and it'll avoid putting bugs in the repo history that were never actually part of the repo at any given time).

@antirez
Copy link
Contributor

antirez commented Mar 21, 2019

@oranagra Oh ok so there was no actual attempt at avoiding the flush. The problem with not flushing is that is useful only if we then don't crash on duplicated keys, that involves probably touching too much code, maybe we could skip that for now.

Now that I understood a bit better the goal, I've the feeling that adding a NOSAVE option to DEBUG RELOAD and, if useful (hardly) in LOADAOF, may be more sensible. This way we basically avoid duplicating the code and adding subcommands. Makes sense? I would squash the commit in that case indeed.

@oranagra
Copy link
Member Author

@antirez sorry for the back and forth (3 year old PR), so i don't remember what i meant.
re-reading my original post at the top, i see that i did remove the emptydb() on purpose, and that i used loadDataFromDisk() rather than rdbLoad() on purpose too.

i.e. i did want to let power users load either AOF or RDB without either saving or flushing, and have them use FLUSHDB manually if they want just one.
for some reason, i few weeks later i added the call to emptydb().

so it is now up to you.

  1. replace all of that with a simple NOSAVE option to DEBUG RELOAD
  2. or keep the current code in the PR for DEBUG LOAD add an optional NOFLUSH

@antirez antirez added this to the Redis 6 milestone Mar 25, 2020
antirez added a commit that referenced this pull request Apr 9, 2020
@antirez
Copy link
Contributor

antirez commented Apr 9, 2020

@oranagra please check the faster-rdb-loading branch.

@oranagra
Copy link
Member Author

oranagra commented Apr 9, 2020

@antirez looks good to me.
i have a few minor concerns:

i'm afraid a module may try to RM_RetainString the key name it gets from RM_GetKeyNameFromIO, so the stack allocated robj may be problematic.
on the other hand, maybe should keep using the static stack allocation and force modules that use it to create a new string if for some reason they wish to retain it.

i'm worried we may want to add more logic to dbAdd some day and regret that rdb.c calls dictAdd directly, please consider adding flags to dbAdd, or adding a variant like dbAddLoadedKey or alike. i think we should avoid doing direct dictFind and dictAdd in random places in the source tree, and always use dict.c wrappers.

@antirez
Copy link
Contributor

antirez commented Apr 9, 2020

@oranagra thanks for the solid suggestions, I'm upgrading my branch using your hints, and then re-comment here with what I did to try addressing the issues you raised.

@antirez
Copy link
Contributor

antirez commented Apr 9, 2020

@oranagra I updated the branch in two ways:

  1. Now statically allocated objects will assert on incrRefCount().
  2. RDB loading refactored to a function "near" dbAdd().

I guess we should also document that modules should not retain keys, even if it will be pretty obvious from the effects: who attempts to do that will se the server crashing immediately during developments.

@oranagra
Copy link
Member Author

oranagra commented Apr 9, 2020

@antirez looks like you forgot to use OBJ_STATIC_REFCOUNT in initStaticStringObject.

personally i don't think it's necessary to document that issue in RM_RetainString. it's a rare use case, and as you said, it'll be obvious when testing it at development time.

@antirez
Copy link
Contributor

antirez commented Apr 9, 2020

@oranagra I did it in another commit immediately after, but should already be there in the commits in the branch, maybe I forgot to push it. Checking.

@antirez
Copy link
Contributor

antirez commented Apr 9, 2020

@oranagra you are right! I forgot to push the last commit, now is there as well. So you look positive about it... merging :-) 25% speedup in loading time is really quite interesting after all, and the new DEBUG RELOAD could be useful. Thank you for the help.

@antirez
Copy link
Contributor

antirez commented Apr 9, 2020

(closing)

@antirez antirez closed this Apr 9, 2020
antirez added a commit that referenced this pull request Apr 15, 2020
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Apr 21, 2020
@oranagra oranagra deleted the DEBUG_LOAD branch November 14, 2021 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants