Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Remove gArgs from bdb.h and sqlite.h #23732

Merged

Conversation

kiminuo
Copy link
Contributor

@kiminuo kiminuo commented Dec 10, 2021

Contributes to #21005.

The goal of this PR is to remove gArgs from database classes (i.e. bdb.h and sqlite.h) so that they can be tested without relying on gArgs in tests.

Notes:

  • My goal is to enable unit-testing without relying on gArgs as much as possible. Global variables are hard to reason about which in turn makes it slightly harder to contribute to this codebase. When the compiler does the heavy lifting for us and allows us only to construct an object (or call a method) with valid parameters, we may also save some time in code reviews. The cost for this is passing an argument which is not for free but the cost is very miniscule compared to benefits, I think.
    • GUI code is an exception because it seems fine to have gArgs there so I don't plan to make changes in src/qt folder, for example.
  • My approach to removal of gArgs uses is moving from lower levels to upper ones and pass ArgsManager as an argument as needed. The approach is very similar to what tree-wide: De-globalize ChainstateManager #20158.

@ryanofsky
Copy link
Contributor

In general this seems good, but I think you could shrink the PR and improve it by not exposing ArgsManager to low-level BDB and sqlite code, by instead adding fields to the existing DatabaseOptions struct

struct DatabaseOptions {
  ...
  bool use_unsafe_sync = false;
  bool use_shared_memory = !DEFAULT_WALLET_PRIVDB;
  int64_t checkpoint_log_size = DEFAULT_WALLET_DBLOGSIZE;
};

and adding corresponding m_ members to the SQLiteDatabse and BerkeleyEnvironment classes where necessary. This way, database options are consolidated in one location, less code needs to change, and ArgsManager is not involved in database setup or operation.

@kiminuo
Copy link
Contributor Author

kiminuo commented Dec 10, 2021

Thank you for the suggestion! Yes, that is an alternative. From the design point of view, we seem to balance the following properties:

  • Easiness to use new arguments in bdb and sqlite code.
  • Whether ArgsManager is meant to be basic class for all other code to use. In your approach, it hints it is not. I guess it makes sense not to expand what we consider to be "basic" unless really needed.
  • Both approaches remove gArgs which is my main goal really.

Your sugestion seems reasonable. I'll try to modify it.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 10, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24470 (Disallow more unsafe string->path conversions allowed by path append operators by ryanofsky)
  • #20205 (wallet: Properly support a wallet id by achow101)
  • #19873 (Flush dbcache early if system is under memory pressure by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@achow101
Copy link
Member

I agree with @ryanofsky. I don't think we should be adding ArgsManager to the low level db implementations.

@kiminuo kiminuo force-pushed the feature/2021-12-02-db-without-gArgs-base branch 3 times, most recently from da71387 to 7bf6eab Compare December 16, 2021 09:54
@kiminuo
Copy link
Contributor Author

kiminuo commented Dec 16, 2021

Thank you. I'm slowly trying to refactor it to follow ryanofsky's suggestion.

@kiminuo kiminuo force-pushed the feature/2021-12-02-db-without-gArgs-base branch 2 times, most recently from 865aa4f to 869fab4 Compare December 18, 2021 22:40
@kiminuo
Copy link
Contributor Author

kiminuo commented Dec 19, 2021

@ryanofsky Would the latest approach make more sense to you? There are still rough edges - i.e.:

  • the new names of DatabaseOptions parameters
  • whether it is acceptable to force construction of DatabaseOptions via ArgsManager.
  • whether std::shared_ptr is appropriate for passing of DatabaseOptions. Attempt to go with std::unique_ptr instead?
  • we can also strive for std::shared_ptr<const DatabaseOptions> (with the const modifier).

@ryanofsky
Copy link
Contributor

@ryanofsky Would the latest approach make more sense to you?

There are some good changes here, but I think it is a mistake change DatabaseOptions usage so much and share it so widely. The goal of the suggestion was really to make PR smaller, not larger.

Here's an implementation that just adds a few fields to the DatabaseOptions struct without changing how the struct is used: 17176ef (branch). This version (+90/-60 lines) is smaller than both the original version (+121/-101) e0500ac and the current version (+157/-133) 869fab4.

I think DatabaseOptions shouldn't be a part of the BerkeleyEnvironment / BerkeleyDatabase / SQLiteDatabase classes, because most of the information it exposes is stale and not relevant to them.

I also think ArgsManager shouldn't be required to use the MakeDatabase function or database functionality in general. It should be possible to get database options from other sources like RPC parameters, or GUI selections, or choose different options depending on whether database is opened for a batch operation or continuous use. The point of getting rid of gArgs is not just to get rid of the variable, but to use less shared state in general and make option setting more explicit.

@kiminuo kiminuo force-pushed the feature/2021-12-02-db-without-gArgs-base branch from 869fab4 to 8a1a1fc Compare December 21, 2021 13:42
@kiminuo
Copy link
Contributor Author

kiminuo commented Dec 21, 2021

I think DatabaseOptions shouldn't be a part of the BerkeleyEnvironment / BerkeleyDatabase / SQLiteDatabase classes, because most of the information it exposes is stale and not relevant to them.

I couldn't decide what approach to choose. I went with the more flexible one (in the sense that adding a new option is easy) because that is what gArgs provides too. This of course makes the assumption that we need that flexibility and I understand now that is not needed. Anyway, I like your code and how nicely it is written.

I also think ArgsManager shouldn't be required to use the MakeDatabase function or database functionality in general.

I see. I did it because it helps find all the places where DatabaseOptions are being instantiated and also I like the guarrantee that the object cannot be instantiated incorrectly.

@kiminuo kiminuo force-pushed the feature/2021-12-02-db-without-gArgs-base branch from 8a1a1fc to 0a0a6de Compare December 21, 2021 14:01
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 0a0a6de, which uses suggested changes I posted yesterday and fixes doxygen comments.

I see. I did it because it helps find all the places where DatabaseOptions are being instantiated and also I like the guarrantee that the object cannot be instantiated incorrectly.

This is reasonable and I also added a non-default DatabaseOptions constructor temporarily to help verify the PR would not change behavior. But in general I don't think having an ArgsManager instance should be required to use the database API. And even if shared global options are used everywhere now, they shouldn't be required to be used for new databases and new database operations in the future.

@@ -58,12 +58,12 @@ bool WalletDatabaseFileId::operator==(const WalletDatabaseFileId& rhs) const
* erases the weak pointer from the g_dbenvs map.
* @post A new BerkeleyEnvironment weak pointer is inserted into g_dbenvs if the directory path key was not already in the map.
*/
std::shared_ptr<BerkeleyEnvironment> GetBerkeleyEnv(const fs::path& env_directory)
std::shared_ptr<BerkeleyEnvironment> GetBerkeleyEnv(const fs::path& env_directory, bool use_shared_memory)
Copy link
Member

@laanwj laanwj Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for using "shared memory" here instead of just "private"? Especially as it comes from a "-privdb" flag and is eventually converted to a DB_PRIVATE flag. I don't think any flavor of actual shared memory is used here?
Another option would be to use std::option<fs::path> and a single argument. After all, the path is irrelevant in case of a private database.

Copy link
Contributor

@ryanofsky ryanofsky Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never heard of a "private database" before this PR and I think that terminology is a little arcane, but shared memory is a concept I'm familiar with, and that's what the flag controls, so that's what i called the DatabaseOptions member. This particular function argument and a few local variables are named to be consistent with the DatabaseOptions field name, but if you want to call them something different, that sounds fine. Maybe say what specific variables you would like to rename and what you would like to call them. If you want everything in bdb.h/bdb.cpp to use a bool db_private instead of bool use_shared_memory that seems reasonable (though IMO use_shared_memory is still clearer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/observerdev/db-4.8.30/blob/3c75267f11336acb1be06bed419e8056cc689d09/dbinc/region.h#L26-L28 - Is this possibly the correct explanation of DB_PRIVATE?

I'm sure it's correct. I don't think is any question about what that flag does. The questions here are just about readability and variable names in this PR. I don't think there's anything that needs to be done yet, except rebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was supposed to be for my information :)

@kiminuo
Copy link
Contributor Author

kiminuo commented Mar 9, 2022

@ryanofsky Would you mind re-ACKing, please?

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 3f730ad. No changes since last review other than rebase


@ryanofsky Would you mind re-ACKing, please?

Sure

src/wallet/db.cpp Outdated Show resolved Hide resolved
src/wallet/bdb.cpp Outdated Show resolved Hide resolved
Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
@kiminuo kiminuo force-pushed the feature/2021-12-02-db-without-gArgs-base branch from 3f730ad to 39b1763 Compare March 16, 2022 07:29
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 39b1763. Just the two small ReadDatabaseArgs and Berkeley open changes that were discussed since the last review

@achow101
Copy link
Member

Since you're basically adding ReadDatabaseArgs everywhere a DatabaseOptions is created, could this function turned into a constructor for DatabaseOptions? This would also help in the future so that we don't forget that we need to set some options based on the cli args.

@ryanofsky
Copy link
Contributor

Since you're basically adding ReadDatabaseArgs everywhere a DatabaseOptions is created, could this function turned into a constructor for DatabaseOptions? This would also help in the future so that we don't forget that we need to set some options based on the cli args.

My thinking is that I would like it to be possible to call wallet database functions without having any ArgsManager. It's true if that wallet database code doesn't depend on ArgsManager, it can't require an ArgsManager argument, and it is theoretically possible for a bug to arise where ArgsManager options are not applied despite it making sense to apply them. But I think potential for having this type of bug is a reasonable tradeoff for wallet database code to be more modular.

I wouldn't oppose if you want to DatabaseOptions to require an ArgsManager instance to be constructed, but I don't think it's the best approach. Also doing this would require changing the AddArg code and maybe reintroducing the DEFAULT_WALLET_DBLOGSIZE and DEFAULT_WALLET_PRIVDB constants I thought we would be happy to get rid of.

@achow101
Copy link
Member

ACK 39b1763

@kiminuo
Copy link
Contributor Author

kiminuo commented Mar 23, 2022

Since you're basically adding ReadDatabaseArgs everywhere a DatabaseOptions is created, could this function turned into a constructor for DatabaseOptions? This would also help in the future so that we don't forget that we need to set some options based on the cli args.

I actually did that in a previous iteration of this PR but @ryanofsky persuaded me that it was/is better to avoid doing that.

@maflcko maflcko merged commit 98e9d8e into bitcoin:master Mar 24, 2022
@kiminuo kiminuo deleted the feature/2021-12-02-db-without-gArgs-base branch March 24, 2022 12:24
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 2, 2022
@bitcoin bitcoin locked and limited conversation to collaborators Mar 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants