Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: Add sqlite format option for dumptxoutset #24952

Closed
wants to merge 2 commits into from

Conversation

dunxen
Copy link
Contributor

@dunxen dunxen commented Apr 23, 2022

This allows RPC users of the dumptxoutset command to output to a SQLite DB format with human-readable fields.
This approach allows for trivial export to JSON and CSV by using sqlite3 (detailed in #24628).

There is a new optional 'format' parameter for dumptxoutset that can take on the values of either 'compact' or 'sqlite'
with 'compact' as the default. This allows for backward compatibility.

Closes #24628

@dunxen
Copy link
Contributor Author

dunxen commented Apr 23, 2022

Some basic timings for mainnet on my (admittedly not so amazing iMac), comparing to the existing compact format.

Time

Screenshot 2022-04-23 at 09 09 57

Size

Screenshot 2022-04-23 at 09 10 31


Roughly around twice as long and twice as big.

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 24, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25191 (build: Make --with-gui=qt6 configure option available on macOS by hebasto)
  • #24923 (Rework logging timer by MarcoFalke)
  • #24798 ([POC] build: Hello Qt 6 by hebasto)
  • #24410 ([kernel 2a/n] Split hashing/index GetUTXOStats codepaths, decouple from coinstatsindex by dongcarl)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@dunxen dunxen force-pushed the 2022-04-sqlite-dumptxoutset branch from eaa4e63 to 21ee90d Compare April 24, 2022 11:16
throw JSONRPCError(RPC_INTERNAL_ERROR, strprintf("Unable to open database file %s", path.u8string()));
}

ret = sqlite3_exec(db, "PRAGMA journal_mode=wal2;", nullptr, nullptr, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment explaining why we want journal_mode set to wal2 and ideally add a link to SQLite documentation: https://www.sqlite.org/cgi/src/doc/wal2/doc/wal2.md

Copy link
Contributor

@prusnak prusnak Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, isn't journal_mode = off even better match for what we want? I think that not using journal might significantly improve write performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't need/use rollbacks. It's just an export so that makes sense.

Copy link
Contributor

@prusnak prusnak Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is now addressed in 84a6e28, thanks!

src/rpc/blockchain.h Outdated Show resolved Hide resolved
src/rpc/blockchain.h Outdated Show resolved Hide resolved
@dunxen
Copy link
Contributor Author

dunxen commented Apr 24, 2022

Some other failures related to string type conversions in Windows I'm not so familiar with here. Will have a closer look when I'm back home.

EDIT: Never mind. I can't read...

@dunxen dunxen force-pushed the 2022-04-sqlite-dumptxoutset branch 3 times, most recently from fad59d2 to 889f6c7 Compare April 24, 2022 19:34
Copy link
Member

@willcl-ark willcl-ark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 889f6c7. Manually inspected a few random rows from the first 100 in the outputted db, which looked good. I left a few additional comments inline.

I also see twice the time and speed on my side:

Screenshot 2022-04-24 at 20 55 09

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
fs::rename(temppath, path);
break;
case UTXOSnapshotFormat::SQLITE:
fs::remove(temppath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we don't use the temppath for sqlite like we do on compact?

Copy link
Contributor

@w0xlt w0xlt Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temppath adds ".incomplete" to the end of the filename.
path is the path entered by the user, so the database is created using this parameter.

temppath can eventually be removed from CreateUTXOSnapshot() as this parameter is used only once and for logging purposes. I think the same information can be obtained from afile as it is created from temppath.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not used in SQLite branch (nothing actually gets written to it), but it is somewhat tightly coupled, and since we need to do the USE_SQLITE check, it makes it a little difficult to completely decouple. So I just thought to remove it at the end. Acts as a decent indicator that nothing bad happened too (although redundant).

Maybe this can be refactored in the future if we don't want it.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 889f6c7

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
@dunxen dunxen force-pushed the 2022-04-sqlite-dumptxoutset branch from 889f6c7 to 2d95dbb Compare April 25, 2022 10:24
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
test/functional/rpc_dumptxoutset.py Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
@dunxen dunxen force-pushed the 2022-04-sqlite-dumptxoutset branch 2 times, most recently from 3a761ef to 2d95f24 Compare April 25, 2022 14:07
Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

I first thought creating an external tool to convert from compact-serialized format to SQLite was a viable alternative to implement this directly in bitcoind, but it turned out to be bad idea; one of the reasons is that the decompression logic has to be implemented again, which in one part even needs secp256k1 (#24628 (comment)).

Some initial thoughts:

  • Is there any reason why we store prevoutHash and scriptPubKey with type TEXT rather than a BLOB? Using the latter should reduce the size of the resulting database by almost 2x, and SQLite supports select hex(column) in case the hex representation is needed by the user
  • Not sure if it helps in this scenario or is worthy to investigate, but in some cases I've made good experiences by tuning the cache_size parameter (https://www.sqlite.org/pragma.html#pragma_cache_size) in order to have less frequent writes (the default is 2 MB)

@prusnak
Copy link
Contributor

prusnak commented Apr 25, 2022

  • Is there any reason why we store prevoutHash and scriptPubKey with type TEXT rather than a BLOB?

One of the goals of this change is to enable trivial creation of JSON/CSV files with the data. This can be achieved simply by using one of the following:

  • sqlite3 -header -csv utxo.db "SELECT * FROM utxos" > utxo.csv
  • sqlite3 -json utxo.db "SELECT * FROM utxos" > utxo.json

If BLOB was used instead of hex stored in TEXT, this conversion step could not have been done trivially as one would have to specify a more complex SQL query such as:

SELECT hex(txid), vout, value, coinbase, height, hex(scriptpubkey) FROM utxos

However, it might still be worth a shot to require more complicated SQL query if we really gain 2x the performance and decrease the disk space to 50%.

@dunxen
Copy link
Contributor Author

dunxen commented Apr 25, 2022

  • Not sure if it helps in this scenario or is worthy to investigate, but in some cases I've made good experiences by tuning the cache_size parameter (https://www.sqlite.org/pragma.html#pragma_cache_size) in order to have less frequent writes (the default is 2 MB)

Thanks! I'm happy to investigate tuning this. We're currently roughly getting it to be double the time of a compact dump.

@prusnak
Copy link
Contributor

prusnak commented Apr 25, 2022

We're currently roughly getting it to be double the time of a compact dump.

I think you can't get much better than this. You are writing approx 2x the data (because of hex representation of txid and scriptpubkey and my guess is that the whole operation is I/O bound and not CPU bound.

IMO the only way how to make the process fast is to use BLOB type as suggested by @theStack above at the expense of requiring more complex query when working with the data.

@dunxen
Copy link
Contributor Author

dunxen commented Apr 25, 2022

However, it might still be worth a shot to require more complicated SQL query if we really gain 2x the performance and decrease the disk space to 50%.

I think that's a fair trade-off. Also should not be all that more complicated of a query.

@theStack
Copy link
Contributor

@prusnak: Thanks for elaborating, easy conversion to CSV/JSON etc. makes sense!

One thing that has to be taken care of is the endianness of the txid hash. Unfortunately Bitcoin displays (and accepts, in case of user input for RPCs) txids and block hashes as little endian, but they are stored internally as big endian: https://bitcoin.stackexchange.com/a/2069
That means, if we decide to store in TEXT, then the txid has already the typical form as we know them from wallets, block explorers, debug outputs etc. (hash.GetHex() takes care of that), but on the other hand if we want to use that prevout for e.g. constructing a transaction in a raw byte-by-byte way, then it would need to be reversed first.
If we store it as a BLOB though, it has the internal big endian format, but hex(txid) doesn't show it in the right way, as it would need to be reversed first. Unfortunately, there is no built-in "reverse" operator in SQLite :/ If there was, we could simply do something like the following for the conversion script:
SELECT hex(reverse(txid)), vout, value, coinbase, height, hex(scriptpubkey) FROM utxos

(For the scriptPubKey, there is no such endianness problem as it is simply a sequence of bytes)

So as far as I can see, we have to decide between either easy-conversion or ~2x performance / 50% size (though the latter is still just an assumption). One possible idea would be to store the BLOB already reversed, but this is kind of a something in-between solution that possibly is even more confusing for users.

Copy link
Contributor

@brunoerg brunoerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@pk-b2
Copy link

pk-b2 commented May 10, 2022

Is it worth to consider to just write it to stdout while iterating all utxos?

That would make it easy for a consumer to bring it into any required format for further processing or analyzing without having Core to make that decision.

sqlite3 is able to capture from stdin as an import source and most other formats like csv or json can already be processed via the command line. You could even already apply filtering while dumping all records.

@fanquake
Copy link
Member

The grave drawback of the current UTXO dump format though is that it's solely designed for being compact but nothing else, i.e. it's not useful for inspection at all and unusable without further processing; too many fields are compressed in a non-trivial way.

It really sounds like we should be improving the current format (I don't think we need to worry about backwards compatibility too much), given the issues you've outlined, rather than adding a new format, more code, and a dependency we don't want. Otherwise we've still got to maintain the old format, which is compressed (convenient), but not really useful unless you write your own parser (clearly doable), and also maintain the new format, which is seemingly only being introduced because the current format isn't really usable for what everyone wants.

@dunxen dunxen force-pushed the 2022-04-sqlite-dumptxoutset branch from 8fe0e7c to 01720ab Compare May 11, 2022 09:28
@theStack
Copy link
Contributor

It really sounds like we should be improving the current format (I don't think we need to worry about backwards compatibility too much), given the issues you've outlined, rather than adding a new format, more code, and a dependency we don't want. Otherwise we've still got to maintain the old format, which is compressed (convenient), but not really useful unless you write your own parser (clearly doable), and also maintain the new format, which is seemingly only being introduced because the current format isn't really usable for what everyone wants.

The compact-serialized format was introduced as part of the AssumeUTXO project (see #16899) for creating and loading UTXO snapshots. Not sure if at this point changing the specification is desired/needed, I guess for this specific use-case the format is just fine (maybe @jamesob can comment on this?). So it seems that we really have two different goals here: on the one hand a snapshot format to be saved/loaded by bitcoind or even other clients (and that in the most space-efficient way possible), on the other hand providing a neutral easy-to-process dump-format that can be inspected by users. I agree that it would be the best solution if both of these goals could be met by a single format.

@dunxen
Copy link
Contributor Author

dunxen commented May 11, 2022

There's an issue in the latest push with configuring with --disable-wallet --with-sqlite=yes with the build system. I'll take a look in the morning. USE_SQLITE comes out true as expected, though. Maybe something that the wallet flag is guarding.

This allows RPC users of the `dumptxoutset` command to output to a SQLite DB format with human-readable fields.
This approach allows for trivial export to JSON and CSV by using `sqlite3` (detailed in bitcoin#24628).

There is a new optional 'format' parameter for `dumptxoutset` that can take on the values of either 'compact' or 'sqlite'
with 'compact' as the default. This allows for backward compatibility.

Closes bitcoin#24628
@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@laanwj
Copy link
Member

laanwj commented May 31, 2022

. I agree that it would be the best solution if both of these goals could be met by a single format.

This could be argued the other way around too: maybe it's good to have an internal and an external format. The internal format could be changed more easily when necessary for our internal purposes. It's like the block index, the utxo index, etc. Someone can write a perser but there are no guarantees of compatibility. Also, it only needs the data we need to reconstruct from a snapshot, nothing more, nothing less.

However, changes in the external format, a documented API, have to take external clients into account.

@jamesob
Copy link
Member

jamesob commented Jun 6, 2022

Alternatively, the existence of the tool may make the PR obsolete. If it's possible for Core to just maintain a single, compact, dependency-less output format, and it's easy for devs or power users to write & use tools that can convert that compact format into whichever other formats they need, that seems superior to introducing more code (to maintain) into Core, and a new non-wallet sqlite dependency.

So it seems that we really have two different goals here: on the one hand a snapshot format to be saved/loaded by bitcoind or even other clients (and that in the most space-efficient way possible), on the other hand providing a neutral easy-to-process dump-format that can be inspected by users.

@fanquake @theStack yeah, the existing format is intended to be as space efficient as possible since, ideally, that output will be commonly shared across network to help get nodes going. That consideration, in conjunction with

  • external tools needing libsecp to render the compressed data, and
  • the obligation that would be introduced by making a previously opaque data blob "publicly" consumable

makes me sympathetic to this changeset. I think as long as this feature gracefully degrades when sqlite support isn't compiled in, I'm concept ACK and will be happy to review this soon.

@laanwj
Copy link
Member

laanwj commented Jun 22, 2022

Needs a trivial rebase for #24410:

SnapshotMetadata metadata{tip->GetBlockHash(), maybe_stats->coins_count, tip->nChainTx};

Edit: i get the following error locally while running the rpc_dumptxoutset.py test:

 test  2022-06-22T17:09:26.669000Z TestFramework (ERROR): Assertion failed
                                   Traceback (most recent call last):
                                     File "/store/user/projects/bitcoin/bitcoin/test/functional/test_framework/util.py", line 134, in try_rpc
                                       fun(*args, **kwds)
                                     File "/home/user/projects/bitcoin/bitcoin/test/functional/rpc_dumptxoutset.py", line 95, in <lambda>
                                       assert_raises_rpc_error(-32603, 'Unable to open database file', lambda: node.dumptxoutset('bad_path/' + FILENAME, 'sqlite'))
                                     File "/store/user/projects/bitcoin/bitcoin/test/functional/test_framework/coverage.py", line 49, in __call__
                                       return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
                                     File "/store/user/projects/bitcoin/bitcoin/test/functional/test_framework/authproxy.py", line 144, in __call__
                                       raise JSONRPCException(response['error'], status)
                                   test_framework.authproxy.JSONRPCException: Couldn't open file /tmp/test_runner_₿_🏃_20220622_190924/rpc_dumptxoutset_0/node0/regtest/bad_path/txoutset.db.incomplete for writing. (-8)

Looks like it expects error code -32603 but gets -8.

@fanquake
Copy link
Member

Concept NACK - I hope we don't go down this route. We've spent years trying to reduce the number of external dependencies in Bitcoin Core; especially in the option-less build. At this point in the project, I think adding new, or expanding the scope of, dependencies in this code base, should be reserved for things that are actually important/mission critical, i.e libsecp256k1, not for nice-to-have features, that are going to be used by a (relatively) small number of developers and enthusiasts, and not actually by the majority of nodes running this peice of software.

If anything, Bitcoin Core should be removing more code, and creating external tooling for new features, rather than continuing to accumulate functionality in this codebase because it's convenient. In this case, an external tool even exists already (I don't really understand why it needing libsecp as a dependency is a problem). My longer term conern is that if something like this is merged, in 3 months time, we'll have a new pull-request, to add support for using sqlite to dump out logs in json format, or some other thing, that clearly shouldn't be a concern/maintenance burnden of Bitcoin Core.

This PR also gives us somewhat awkward build logic, where an RPC feature is enabled if a user compiles with a certain dependency, that previously, only controlled whether or not you got a certain type of wallet.

How is this going to work with #24202? Are we abandoning that PR in favour of this, or will we end up with a trifecta of output formats?

In any case, this PR (rebased) is currently broken on arm64 macOS, when building with --disable-wallet --with-sqlite. sqlite is available, and building master with the same options works as expected:

  CXXLD    bitcoin-cli
  CXXLD    minisketch/test
  CXXLD    bitcoin-tx
  CXXLD    bitcoin-util
ar: -L/opt/homebrew/Cellar/sqlite/3.38.5/lib: No such file or directory
ar: -lsqlite3: No such file or directory
gmake[2]: *** [Makefile:5897: libbitcoin_node.a] Error 1

@theStack
Copy link
Contributor

@fanquake: Great points raised that should be considered, I agree that an additional external dependency definitely has its drawbacks. Still, the somewhat uncomfortable fact is that right now we don't have a proper "dumb" (sic!) dump format for the UTXO set that only contains pure information and thus can easily be searched/processed. I think there should be one, as the current compact-serialized format has a different purpose and won't change, and considering that the UTXO set is one of the (if not the) most important data-structures on all nodes (be it pruned or non-pruned), as it reflects the current state / distribution of funds with all (visible) spending conditions, I would argue it's worth to make an exception here to the "we don't add convenience features if the same result can also be achieved somehow with external tools" rule. This enables the user to easily inspect that state without the need for further fiddling around between different formats and investing extra time and disk space.

How is this going to work with #24202? Are we abandoning that PR in favour of this, or will we end up with a trifecta of output formats?

I think it's pretty obvious that this PR would obsolete #24202, at least I can't come up with a good reason why anyone would still be interested in a text format if you can already have a superset in the form of a simple but still powerful database (converting from SQLite to CSV format is not a big deal)? Though, if this PR is too controversial, it's probably still better to have #24202 instead of no proper UTXO set dump format at all. A compromise could be #24202 plus as follow-up a trivial tool (e.g. in contrib/ or external) that converts the text output to a SQLite database? (Would maybe still be faster than the current approach of emiting a compact-serialized dump + converting that to SQLite format via utxo_to_sqlite). Very curious to hear others opinions.

@laanwj
Copy link
Member

laanwj commented Jun 23, 2022

How is this going to work with #24202? Are we abandoning that PR in favour of this, or will we end up with a trifecta of output formats?

I think we should settle on two formats. The format for internal use and a format for external use. From there on, it's for people with custom scripts to analyze it or convert it into anything they want (it's not like end users have a use for an UTXO dump anyway). As external format I think sqlite is a surprisingly good choice for these kind of huge data dumps (better than say, JSON, or formats that require writing custom parsers). It should still be possible to compile the node without sqlite though for people (most) who don't need external UTXO exports.

Though, if this PR is too controversial, it's probably still better to have #24202 instead of no proper UTXO set dump format at all. A compromise could be #24202 plus as follow-up a trivial tool (e.g. in contrib/ or external) that converts the text output to a SQLite database? (Would maybe still be faster than the current approach of emiting a compact-serialized dump + converting that to SQLite format via utxo_to_sqlite). Very curious to hear others opinions.

I agree.

@jamesob
Copy link
Member

jamesob commented Jun 23, 2022

I think answers to these questions would be illuminating for comparison to #24202:

  • is there a considerable difference in final output size for CSV vs. sqlite? is one more space efficient than the other?
  • what is the code complexity and runtime of, say, a Python script that writes streaming CSV input into sqlite?

If space is comparable between the two options, and a script that adapts CSV -> sqlite is easy (I think it would be) and relatively quick (not sure here), I would say that #24202 is preferable, since it would ship with all builds. We could even bundle such a script in contrib/.

That said, I think dependency considerations aside, sqlite is probably a much more practical output format for actual consumption. Reading sqlite into, say, pandas for further analysis is as trivial CSV - in fact, based on the function signatures of pandas.read_csv vs. pandas.read_sql_table, sqlite is probably the easier choice.

@theStack
Copy link
Contributor

is there a considerable difference in final output size for CSV vs. sqlite? is one more space efficient than the other?

Did a quick comparison between a SQLite output that I created with this PR ~3 weeks ago (82401330 UTXO entries) and a text output with #24202 that I created just a few minutes ago (82801821 UTXO entries):

$ ls -l ~/.bitcoin/utxos.sqlite ~/.bitcoin/utxo_set.txt
[...] 11009749503 Jun 23 19:20 /home/honeybadger/.bitcoin/utxo_set.txt
[...] 12445270016 Jun  3 15:44 /home/honeybadger/.bitcoin/utxos.sqlite

Both seem to take roughly ~2x of the size of the compact-serialized format, with the text output even being a bit smaller (~10.25 GB vs ~11.59 GB). The vast majority of the space per entry is taken by txid and scriptPubKey, and since we store them as hex-encoded strings (both in this PR and in #24202), it makes sense that the sizes are in a similar area.

what is the code complexity and runtime of, say, a Python script that writes streaming CSV input into sqlite?

Pretty sure that the code complexity is low (seems like sqlite3 can even directly import CSV files and it could be a very simple shell-script? https://stackoverflow.com/a/1045961), while the runtime could be a bit of a show-stopper here. Will try to write such a script within the next days and report the results then.

@JeremyRubin
Copy link
Contributor

Unfortunately, there is no built-in "reverse" operator in SQLite :/ If there was, we could simply do something like the following for the conversion script: SELECT hex(reverse(txid)), vout, value, coinbase, height, hex(scriptpubkey) FROM utxos

Actually, you can use the SUBSTR and LENGTH https://www.sqlite.org/lang_corefunc.html#substr

The substr(X,Y,Z) function returns a substring of input string X that begins with the Y-th character and which is Z characters long. If Z is omitted then substr(X,Y) returns all characters through the end of the string X beginning with the Y-th. The left-most character of X is number 1. If Y is negative then the first character of the substring is found by counting from the right rather than the left. If Z is negative then the abs(Z) characters preceding the Y-th character are returned. If X is a string then characters indices refer to actual UTF-8 characters. If X is a BLOB then the indices refer to bytes.

so in theory:

SELECT hex(SUBSTR(txid, -1, -LENGTH(txid))), vout, value, coinbase, height, hex(scriptpubkey) FROM utxos

works.


In general I think dumping to sqlite sounds like a fine idea. One point of contention though is it would be nice if we could return the SQLITE file over the network somehow, since not all RPC users have local filesystem access.

@theStack
Copy link
Contributor

Unfortunately, there is no built-in "reverse" operator in SQLite :/ If there was, we could simply do something like the following for the conversion script: SELECT hex(reverse(txid)), vout, value, coinbase, height, hex(scriptpubkey) FROM utxos

Actually, you can use the SUBSTR and LENGTH https://www.sqlite.org/lang_corefunc.html#substr

[...]

so in theory:

SELECT hex(SUBSTR(txid, -1, -LENGTH(txid))), vout, value, coinbase, height, hex(scriptpubkey) FROM utxos

works.

Unfortunately that doesn't work. The substring is still extracted from left to right without any reversing operation involved, the only thing possible is to specify the start and end offsets by counting from right to left:

$ sqlite3
SQLite version 3.38.2 2022-03-26 13:51:10
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select hex(substr(x'11223344', -1, -4));
112233

If reversing in SQLite was possible, it would be very tempting to store the txid and scriptPubkey as BLOBs to have more compact dumps that are created faster.

@JeremyRubin
Copy link
Contributor

ah interesting... since we know TXIDs are 32 bytes, one option would be to do a 32 byte expansion of something like:

substr('abc',3, 1) || substr('abc', 2, 1) || substr('abc', 1,1)

as a generated virtual table. this would have no storage overhead, but would have some overhead on retrieval.

@achow101
Copy link
Member

Are you still working on this?

cc @jamesob

@dunxen
Copy link
Contributor Author

dunxen commented Oct 12, 2022

Are you still working on this?

cc @jamesob

Will get back to it this week, been busy with other work :)

@achow101
Copy link
Member

Will get back to it this week, been busy with other work :)

It's been several weeks since this comment with no additional updates, so I'm going to close this for now. If you do get around to working on this, please leave a comment so that it can be reopened.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rpc: dumptxoutset as sqlite file