Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redis-cli ability to convert an RDB file into JSON or CSV and vice-versa #288

Open
antirez opened this issue Jan 7, 2012 · 16 comments
Open

Comments

@antirez
Copy link
Contributor

antirez commented Jan 7, 2012

A way to turn a Redis RDB database into something easily easy to parse and share outside the Redis world is interesting for many reasons. For instance this would easily allow to mass-import Redis data into MySQL to run queries not possible with Redis, or to launch other off-line processing of data easily.

Additional features that could be interesting:

  • Ability to do export only keys matching a matter, a type, or a DB num
  • Export data in a way that is valid JSON but also parseable with an ad-hoc parser with less state and more performances compared to a real JSON parser. That's easy to do since Redis does not use nested data structures so generating a single entry/element per line is easy.

More ideas are welcome.

As a sub-task of this issue, we should write documentation for the RDB binary format, that is now only described by the source code itself.

@perezd
Copy link

perezd commented Jan 7, 2012

So, something like this then? https://github.com/delano/redis-dump

@antirez
Copy link
Contributor Author

antirez commented Jan 7, 2012

Actually the feature request was triggered exactly by this tool, that is not the right approach IMHO but is trying to fix a real problem. More details here -> http://news.ycombinator.com/item?id=3437804

@mrb
Copy link
Contributor

mrb commented Jan 8, 2012

👍 on documenting the .rdb spec. I can help, I've done some digging into it with some side projects. That sounds like a very sensible first step.

@delano
Copy link

delano commented Jan 8, 2012

It's awesome to see this as a feature request for redis-cli.

How do you see it working? For example, I use redis-dump to create encrypted dumps without intermediary files. Would something like this be possible:

$ redis-cli [args] json-dump > dump.json
# OR
$ redis-cli [args] json-dump | gpg -c > dump.json.gpg

And would it be possible/interesting to have a command to get the JSON dump for a single key? (I use the hash representation in redis-report to calculate and tally the byte size of keys and DBs.)

I ran some tests this week (with yajl-ruby) and I was interested to find out that it's as fast or faster to parse each entry individually rather than parsing them in batches (and the memory usage is also a lot lower obviously). So outputting one record per line performs better and makes it possible to stream/pipe the data.

@xxx
Copy link

xxx commented Jan 8, 2012

+1 for being able to dump a single key.

I recently needed to dump the data from a hash with around 50 million fields in it to clear up some space in our cluster. I was able to do it by the dumping the keys to a file, then iterating over them, which ended up being a bit blah.

@muayyad-alsadi
Copy link

I guess we need the ability to exclude a key blog
or include a key glob

examples
-x 'cache'
--only 'store.*'

@jinleileiking
Copy link

I don't think add this feature in redis-cli is good.

What we need, is just a tool like redis-dump.

As redis, rdb file is enough.

@zhupan
Copy link

zhupan commented Jan 12, 2012

I don't think we need this feature.

@taylormobrod
Copy link

Thumbs up for this feature - or at the very least, proper documentation of the rdb format. Being able to operate on Redis data offline would be very useful.

@AndrewGuenther
Copy link

I don't think this feature is necessary in the cli. Proper documentation of the rdb format would be great and allow tools like redis-dump to become even better.

@sripathikrishnan
Copy link

Here's my first attempt to document the RDB file format - https://github.com/sripathikrishnan/redis-rdb-tools/blob/master/RDB_File_Format.textile

@esmooov
Copy link

esmooov commented Mar 12, 2012

And my attempt as well http://esmooov.github.com/rdbhs/ We should combine notes.

@sripathikrishnan
Copy link

@esmooov : Agree. And glad to know two independent attempts arrived at similar/consistent notes!

Some of the TODOs based on a reading of both our documents -

1 Little v/s big endian - we have both struggled to document this consistently
2 Parsing of doubles
3 More worked out examples
4 A representative dump.rdb which exercises all code paths in a parser. I started work on such a dump, but the coverage isn't great at the moment.
5 Converting the notes to markdown and submitting a pull request on redis-doc, so that it can be merged with the official redis documentation once it is stable.

Do you think we should split up from here? I can put in some more worked out examples and create a more representative dump file; while you can look into big/little endian and the handling of doubles? Later, we can merge my notes into yours and move it to redis-doc

@esmooov
Copy link

esmooov commented Mar 12, 2012

@sripathikrishnan That sounds good. Two things, though. I'm going to have difficulty resolving the endianness issues as I do not have a big-endian machine at my disposal to see whether the encodings are always little-endian or host-endian. Also, doubles are just parsed as bytestrings and then the host language reads the double from the string. However, I will document that.

I have my documentation in Textile, so the switch to Markdown of some love-child of our specs should be relatively painless.

Also, I am working on an interactive dump explorer for the end of my docs where a user can look at a full dump (perhaps whatever test dump you come up with?) and have the encoding broken down for them. I think it will be pretty useful.

Cheers,
Great to have another person working on this.

@mrb
Copy link
Contributor

mrb commented Mar 12, 2012

@sripathikrishnan @esmooov You guys are awesome. In terms of the spec, I think it would be awesome to have something akin to https://github.com/mustache/spec , where you could have progressive, indicative test dumps to work on.

@sripathikrishnan
Copy link

Here is my python implementation for a dump -> json convertor
https://github.com/sripathikrishnan/redis-rdb-tools

You can filter on the database, keys or data type. Its easy to add other types of converters. For now, I have implemented a Json converter, and a plain-text converter. The plain text converter is diff and sort friendly, so you can easily diff the contents of two dump files.

Feedback appreciated!

PatKamin added a commit to PatKamin/redis that referenced this issue Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests