add DEBUG LOAD command #3243

oranagra · 2016-05-19T05:20:04Z

hi, as we discussed, adding a DEBUG command like RELOAD that doesn't first save to disk.
can be used by power users to load data without the need to restart the process.

i decided to support AOF too in this function and also concluded it will be more powerful if it won't flush the database (emptyDb) before loading the file (people can use it to load multiple rdb / aof files)

please note however, that FLUSHALL command saves an empty RDB or adds a flush command to the AOF, so if someone wishes to do a combination of FLUSHALL and then DEBUG LOAD, he'll have to put the file he wishes to load in the right location only after the FLUSHALL.
still since this is a debug command i feel making it more raw and powerful is the right thing, if you think otherwise let me know.

oranagra · 2016-05-19T05:49:50Z

also notice that loadDataFromDisk() can do exit().
i rather not do that in DEBUG, but decided it is better to utilize shared code with the startup process.

antirez · 2019-03-21T10:58:28Z

Hello @oranagra, I think DEBUG commands are exactly for this kind of abuses... so looks good to me, but I don't like the name, could we find a name that better provides the idea that the dataset is not flushed? MERGE? MERGELOAD? OVERLOAD? Not sure, maybe somebody has an idea? @itamarhaber @soveran @yossigo @dvirsky?

Also typo in the comment (I can fix it after merging but just to remember):

(without flashing)

Should be flushing.

dvirsky · 2019-03-21T11:07:38Z

First of all, being able to merge several RDBs into the same instance is a huge feature! maybe worth its own command? I can think of a few scenarios where this would have been helpful.

I like MERGELOAD.

Or maybe:

DEBUG LOAD [MERGE | NOMERGE] or something like that?

antirez · 2019-03-21T11:10:52Z

@dvirsky A MERGE option for LOAD looks an interesting approach indeed.

@oranagra I don't understand why there is an emptyDb(-1) call in the new code if the LOAD documentation states that there is no flushing.

antirez · 2019-03-21T11:13:28Z

Yet another thing @oranagra, shouldn't data loading be wrapped in protectClient() and unprotectClient() like in the RELOAD subcommand? See #4804 (that was incidentally found by @dvirsky and later fixed by @soloestoy).

soloestoy · 2019-03-21T11:26:51Z

BTW, @antirez @oranagra , I think the LOAD subcommand should only run in master mode, because we can't make sure if the RDB file is generated by this instance itself, even if it is, the RDB file may have a gap of the data in memory.

So we should update replication info and disconnect replications to force resync.

antirez · 2019-03-21T11:41:38Z

BTW, @antirez @oranagra , I think the LOAD subcommand should only run in master mode, because we can't make sure if the RDB file is generated by this instance itself, even if it is, the RDB file may have a gap of the data in memory.

So we should update replication info and disconnect replications to force resync.

I think the use case of DEBUG LOAD is to totally abuse the server and do strange stuff :-) So if the user is killing herself/himself, no problem 😸

oranagra · 2019-03-21T12:36:16Z

This is a very old PR (which i rebased yesterday to resolve a merge conflict with the help message).

indeed calls to protectClient() and unprotectClient() are missing (they did not exist when i created the PR).
i think that in the help message i meant to say "without saving" and not "without flushing", otherwise why would i leave the call to emptyDb()?
it would indeed be nice to add a flag for NOFLUSH
i think that being a DEBUG command, we don't need to handle slaves differently. user that calls it should know what he's doing.

@antirez shall i fix this PR (adding NOFLUSH) and protectClient?
if so, should i do it in an additional commit? or squash them into one?
(github will still let you see the diff that the new push adds, and it'll avoid putting bugs in the repo history that were never actually part of the repo at any given time).

antirez · 2019-03-21T12:45:44Z

@oranagra Oh ok so there was no actual attempt at avoiding the flush. The problem with not flushing is that is useful only if we then don't crash on duplicated keys, that involves probably touching too much code, maybe we could skip that for now.

Now that I understood a bit better the goal, I've the feeling that adding a NOSAVE option to DEBUG RELOAD and, if useful (hardly) in LOADAOF, may be more sensible. This way we basically avoid duplicating the code and adding subcommands. Makes sense? I would squash the commit in that case indeed.

oranagra · 2019-03-21T13:56:38Z

@antirez sorry for the back and forth (3 year old PR), so i don't remember what i meant.
re-reading my original post at the top, i see that i did remove the emptydb() on purpose, and that i used loadDataFromDisk() rather than rdbLoad() on purpose too.

i.e. i did want to let power users load either AOF or RDB without either saving or flushing, and have them use FLUSHDB manually if they want just one.
for some reason, i few weeks later i added the call to emptydb().

so it is now up to you.

replace all of that with a simple NOSAVE option to DEBUG RELOAD
or keep the current code in the PR for DEBUG LOAD add an optional NOFLUSH

Related to #3243.

antirez · 2020-04-09T10:11:04Z

@oranagra please check the faster-rdb-loading branch.

oranagra · 2020-04-09T13:41:59Z

@antirez looks good to me.
i have a few minor concerns:

i'm afraid a module may try to RM_RetainString the key name it gets from RM_GetKeyNameFromIO, so the stack allocated robj may be problematic.
on the other hand, maybe should keep using the static stack allocation and force modules that use it to create a new string if for some reason they wish to retain it.

i'm worried we may want to add more logic to dbAdd some day and regret that rdb.c calls dictAdd directly, please consider adding flags to dbAdd, or adding a variant like dbAddLoadedKey or alike. i think we should avoid doing direct dictFind and dictAdd in random places in the source tree, and always use dict.c wrappers.

antirez · 2020-04-09T13:59:58Z

@oranagra thanks for the solid suggestions, I'm upgrading my branch using your hints, and then re-comment here with what I did to try addressing the issues you raised.

antirez · 2020-04-09T14:29:48Z

@oranagra I updated the branch in two ways:

Now statically allocated objects will assert on incrRefCount().
RDB loading refactored to a function "near" dbAdd().

I guess we should also document that modules should not retain keys, even if it will be pretty obvious from the effects: who attempts to do that will se the server crashing immediately during developments.

oranagra · 2020-04-09T16:45:56Z

@antirez looks like you forgot to use OBJ_STATIC_REFCOUNT in initStaticStringObject.

personally i don't think it's necessary to document that issue in RM_RetainString. it's a rare use case, and as you said, it'll be obvious when testing it at development time.

antirez · 2020-04-09T17:12:50Z

@oranagra I did it in another commit immediately after, but should already be there in the commits in the branch, maybe I forgot to push it. Checking.

antirez · 2020-04-09T17:15:04Z

@oranagra you are right! I forgot to push the last commit, now is there as well. So you look positive about it... merging :-) 25% speedup in loading time is really quite interesting after all, and the new DEBUG RELOAD could be useful. Thank you for the help.

antirez · 2020-04-09T17:15:27Z

(closing)

Related to #3243.

Related to redis#3243.

oranagra mentioned this pull request Feb 21, 2019

Add redis-cli support for diskless replication (CAPA EOF) #4811

Merged

oranagra added 2 commits March 20, 2019 17:28

add DEBUG LOAD command

ff47652

add flushdb to DEBUG LOAD

3fec9fb

antirez added this to the Redis 6 milestone Mar 25, 2020

antirez added a commit that referenced this pull request Apr 9, 2020

More powerful DEBUG RELOAD.

4518725

Related to #3243.

antirez closed this Apr 9, 2020

antirez added a commit that referenced this pull request Apr 15, 2020

More powerful DEBUG RELOAD.

23094ba

Related to #3243.

JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Apr 21, 2020

More powerful DEBUG RELOAD.

301c803

Related to redis#3243.

oranagra mentioned this pull request Jul 21, 2020

Stack allocated RedisModuleString #7544

Closed

oranagra deleted the DEBUG_LOAD branch November 14, 2021 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add DEBUG LOAD command #3243

add DEBUG LOAD command #3243

oranagra commented May 19, 2016

oranagra commented May 19, 2016

antirez commented Mar 21, 2019

dvirsky commented Mar 21, 2019

antirez commented Mar 21, 2019

antirez commented Mar 21, 2019 •

edited

Loading

soloestoy commented Mar 21, 2019

antirez commented Mar 21, 2019

oranagra commented Mar 21, 2019

antirez commented Mar 21, 2019

oranagra commented Mar 21, 2019

antirez commented Apr 9, 2020

oranagra commented Apr 9, 2020

antirez commented Apr 9, 2020

antirez commented Apr 9, 2020

oranagra commented Apr 9, 2020

antirez commented Apr 9, 2020

antirez commented Apr 9, 2020 •

edited

Loading

antirez commented Apr 9, 2020

add DEBUG LOAD command #3243

add DEBUG LOAD command #3243

Conversation

oranagra commented May 19, 2016

oranagra commented May 19, 2016

antirez commented Mar 21, 2019

dvirsky commented Mar 21, 2019

antirez commented Mar 21, 2019

antirez commented Mar 21, 2019 • edited Loading

soloestoy commented Mar 21, 2019

antirez commented Mar 21, 2019

oranagra commented Mar 21, 2019

antirez commented Mar 21, 2019

oranagra commented Mar 21, 2019

antirez commented Apr 9, 2020

oranagra commented Apr 9, 2020

antirez commented Apr 9, 2020

antirez commented Apr 9, 2020

oranagra commented Apr 9, 2020

antirez commented Apr 9, 2020

antirez commented Apr 9, 2020 • edited Loading

antirez commented Apr 9, 2020

antirez commented Mar 21, 2019 •

edited

Loading

antirez commented Apr 9, 2020 •

edited

Loading