Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[D8][PS] Make config storage swappable #2277

Closed
sentaidigital opened this issue Oct 12, 2016 · 140 comments · Fixed by backdrop/backdrop#4608
Closed

[D8][PS] Make config storage swappable #2277

sentaidigital opened this issue Oct 12, 2016 · 140 comments · Fixed by backdrop/backdrop#4608

Comments

@sentaidigital
Copy link

sentaidigital commented Oct 12, 2016

There is one storage class available for Config to use: ConfigFileStorage. I'd like the option of using a ConfigDatabaseStorage class that would save the same data, but in the database instead of in flat files.

I can think of the following reasons why:

  • Makes it easier to synchronize config changes among multiple web servers in a load balanced environment.
  • Makes it easier to capture the entire state of an environment for debugging elsewhere.

PR: backdrop/backdrop#4453
Replaces: backdrop/backdrop#1613


Summary:

This is a feature request for an option that will likely only be used by a minority of sites running Backdrop. As you can see in the discussion below, there are other reasons it might still be a candidate for core inclusion. This option would be disabled by default, so there would be no change for existing sites, or new installs.

Here is a list of pros and cons for PMC review:

Pros

  • some load-balanced sites may choose use this setting.
  • similarity to Drupal 8 / Drupal 7 / WordPress (w/ config in db)
  • one less data source for devs to download.
  • discourages manual editing of the active configuration (security by obscurity).
  • Hosting platforms that were built for config-in-db projects could be used in the way they were intended (config + content together)
  • possible performance gain for the minority when feature IS in use (needs benchmarks)
  • Security improvement: databases tend to be behind firewalls and only accessible by the webheads, which protects configuration better than an easily misconfigured web server directive protecting one directory in an otherwise published files directory.
  • It's very hard to add this kind of feature in contrib.

Cons:

  • does not benefit at least 80% of sites using Backdrop.
  • increased complexity / more code to support (slightly).
  • database-storage is less visible than files, making Backdrop harder to learn.
  • Hosting platforms that were built for config-in-db projects would no longer benefit from the separation of config and content
  • possible performance impact on the majority when feature is NOT in use (needs benchmarks)
  • possible performance impact on the minority when feature IS in use (needs benchmarks)
  • We would need to do all automated tests w/ both file and db configs to avoid any possible exceptions.
@sentaidigital
Copy link
Author

sentaidigital commented Oct 12, 2016

Getting Backdrop to store configuration in the database required the following changes:

  • Adding a ConfigDatabaseStorage object that mimics filesystem semantics (notably: ctime)
  • Adding a ConfigStorageInterface::initializeStorage() call to let the storage backend create directories or database tables, as appropriate
  • Cleaning up some of the install process to be more storage agnostic
  • Reorder the bootstrap sequence to put BOOTSTRAP_DATABASE on top, before BOOTSTRAP_CONFIGURATION.

This PR is based on and includes the patch for #2275.

To use the patch, set $configuration_directories['active'] = 'db:/default/config_active'; and run the installer.

I am able to use this patch to stand up a site and store configuration changes in the database. The tests are, of course, a complete mess.

@sentaidigital
Copy link
Author

Test are fixed by adding a new Bootstrap stage: BACKDROP_BOOTSTRAP_DATABASE_EARLY, which has all the database bootstrap code, minus the test prefix validation. The validation remains in the original _backdrop_bootstrap_database() function.

Still some work to be done (import/export is broken, for starters), but I'd like someone else to look at the extra bootstrap stage.

@quicksketch
Copy link
Member

Thanks @sentaidigital for making this issue! It's an item I've been thinking about on and off for quite a while. Using a database storage could make a lot of sense in various hosting situations. You're totally right that in a high-availability scenario, the database is much more accessible than the file system (though that should be shared too, and the options are a lot better now than they were a few years ago, e.g. Gluster and Amazon EFS).

Pantheon in particular could benefit from this change, because it would make it so that if you downsynced your database, it would also include your configuration. The configuration directory isn't easily accessible on Pantheon anyway, so storing the configuration in the database might give you a multiple wins.

The downsides are that database-storage is less accessible and visible than the file system for developers. Config files being easily accessible makes the entire operation of Backdrop seem more transparent and simple than it all being stored in the database.

But overall, I think this is a great option to have, just as long as the default is the file system with the option to switch to the database. Speaking of which, do you have any ideas on how you could change your config storage type other than reinstalling? Seems like it might be a good candidate for a Drush issue.

I'd like someone else to look at the extra bootstrap stage.

I left a full review on the PR. The extra bootstrap phase in particular I think we could improve. Here's my comment from the PR:

Rather than adding a new bootstrap phase, would it be possible to just make the config bootstrap phase "upgrade" the boostrap level in the contructor for the new database config storage class? This is the same thing that BackdropDatabaseCache::__contstruct() does. Even though the cache backend is needed at boostrap level 2, it upgraded the level to 3 so that it can serve the page cache from the database.

For illustration, here's the code from BackdropDatabaseCache():

  /**
   * Constructs a new BackdropDatabaseCache object.
   */
  function __construct($bin) {
    // All cache tables should be prefixed with 'cache_', except for the
    // default 'cache' bin.
    if ($bin != 'cache') {
      $bin = 'cache_' . $bin;
    }
    $this->bin = $bin;

    // Bootstrap the database if it is not yet available.
    if (!function_exists('db_query') || backdrop_get_bootstrap_phase() < BACKDROP_BOOTSTRAP_DATABASE) {
      backdrop_bootstrap(BACKDROP_BOOTSTRAP_DATABASE, FALSE);
    }
  }

Seems like we could use that same bootstrap upgrading code in ConfigDatabaseStorage::__construct() and it would do the job of making the database accessible earlier if you're using database-stored configuration.

@sentaidigital
Copy link
Author

If the bootstrap upgrade works, I have no objections.

But overall, I think this is a great option to have, just as long as the default is the file system with the option to switch to the database. Speaking of which, do you have any ideas on how you could change your config storage type other than reinstalling? Seems like it might be a good candidate for a Drush issue.

It's just JSON in the database record. It should be trivial to export from file and import into the database. Even if we add a way to do it in core, a Drush command would still be a good solution. Either way, migrating from one to the other is beyond the scope of this issue.

I'd like to make the installer UX updates a separate issue, too. Right now, you have to specify the database in settings.php before you start the install, which is good enough to let more advanced site builders play with and break it while the UI is built.

Regarding the JSON part (see comments in PR):

Honestly, I'm not happy about the encoded settings part of this patch. My original plan was to store one setting per row in the database (more like the variables table) so developers and sites admins could change individual settings easily with simple SQL statements. I've had to do this a number of times to fix something broken in a D6 or D7 site. (In fact, I submitted a patch for Domain Access to do so.) Editing a scalar or simple array that's been serialized is doable. Editing a large array that has been serialized is error prone, to say the least.

However, the data is sent to the ConfigStorageInterface is everything in the file, not just the deltas, which means write() would have to either delete the entire block of settings before writing out the new data or do some clever merging to prevent unnecessary writes to the database.

Maybe we can change the semantics from full file to individual setting in 2.x, and leave the JSON functions duplicated until then.

@sentaidigital
Copy link
Author

This does not work:

    // Bootstrap the database if it is not yet available.
    if (!function_exists('db_query') || backdrop_get_bootstrap_phase() < BACKDROP_BOOTSTRAP_DATABASE) {
      backdrop_bootstrap(BACKDROP_BOOTSTRAP_DATABASE, FALSE);
    }
  }

But this does:

    // Bootstrap the database if it is not yet available.
    if (!function_exists('db_query') || backdrop_get_bootstrap_phase() < BACKDROP_BOOTSTRAP_DATABASE) {
      require_once BACKDROP_ROOT . '/core/includes/database/database.inc';
    }
  }

The call to backdrop_bootstrap(BACKDROP_BOOTSTRAP_DATABASE, FALSE) does not work because it never calls the _backdrop_bootstrap_database() function. When the constructor calls backdrop_bootstrap(), bootstrap does not update the $final_phase (bootstrap.inc:2668) because $new_phase is FALSE and the current $stored_phase and $final_phase are both 0 (BACKDROP_BOOTSTRAP_CONFIGURATION), so the while loop (bootstrap.inc:2673) never runs.

The bootstrap upgrade does work if we change that while loop from:

    while ($phases && $phase > $stored_phase && $final_phase > $stored_phase) { ... }

to

    while ($phases && ($phase > $stored_phase || $final_phase > $stored_phase)) { ... }

@mikemccaffrey
Copy link

The downsides are that database-storage is less accessible and visible than the file system for developers. Config files being easily accessible makes the entire operation of Backdrop seem more transparent and simple than it all being stored in the database.

I was initially skeptical of D8's decision to move the active configuration back into the database, but after using it for a while, I'm starting to think it was a good idea. We really don't want to encourage manual editing the active configuration of a site directly. We should be telling people to export their configuration, edit the files, and then import the changes.

It's just JSON in the database record. It should be trivial to export from file and import into the database.

Should we be storing JSON in the database? Even if we are importing and exporting JSON to files, it seems like it might make more sense to store serialized PHP objects in the database records. Again, we don't want people making changes to the active configuration of the site, and should instead be leading them towards the export/change/import workflow.

@sentaidigital
Copy link
Author

The most recent hook_update_N() from #2222 revealed some issues in the update code. A couple patches addressing those issues are now here.

The exportArchive() and new importArchive() member functions are updated, but not tested.

@sentaidigital
Copy link
Author

sentaidigital commented Oct 20, 2016

@quicksketch if you have no objections, I'd like to move some of the changes here to the #2275, including any additions to ConfigStorageInterface and their implementation in ConfigFileStorage, including the is_dir() patch from yesterday and the exportArchive() / importArchive() changes.

@sentaidigital
Copy link
Author

Removed the API additions that are now in #2275 and cleaned up the patch to remove the reverted bootstrap changes. This should make the patch easier to review and do a better job keeping both issues in scope.

@jenlampton
Copy link
Member

jenlampton commented Dec 1, 2016

Hi @sentaidigital and thanks for this PR. Unfortunately, I'm not yet sure this is the kind of change we should be making in Backdrop. My instincts are telling me that this kind of feature addition goes directly against Backdrop's principles.

Makes it easier to synchronize config changes among multiple web servers in a load balanced environment.

This is an edge-case scenario. Changes specifically for this group should not be put in Backdrop core without careful consideration. If we start adding features that benefit the only minority of sites, we are going to end up with code bloat and increased complexity that adversely affects the majority.

Makes it easier to capture the entire state of an environment for debugging elsewhere.

You need to capture both the files and the database to get an exact snapshot. If you keep the config in your files directory (as it is by default) there's nothing else to download. Only when you move the configuration out of the files directory do you now need to capture a third thing. And even so, is it really that much more work to grab a config directory than it is to grab the files directory? Is it worth the cost?

Pantheon in particular could benefit from this change, because it would make it so that if you downsynced your database, it would also include your configuration.

But this isn't always wanted. How would you then down-sync your content only (without configuration)? I don't think Pantheon would add a checkbox for "Database with config" and one for "Database without config" in their UI. We separated content from configuration so that these things could be managed separately, and this feature would defeat that purpose.

The downsides are that database-storage is less accessible and visible than the file system for developers. Config files being easily accessible makes the entire operation of Backdrop seem more transparent and simple than it all being stored in the database.

These are some pretty serious downsides. Even if we don't want developers to be changing active config directly, having the files easily accessible is a huge win. It helps people learn what's going on more quickly, and that's very important for less-savvy developers.

I may be wrong about the possible benefits to our majority use-case, but I'd be curious to hear more perspectives.

Either way, I don't think this is a decision we should rush into, and I'm not sure why this issue was tagged with 1.6. We need to more carefully consider the implications of increasing the complexity and decreasing the visibility in exchange for a feature that may only benefit the 1%.

@jenlampton jenlampton modified the milestones: 1.6.0, 1.x-future Dec 1, 2016
@sentaidigital
Copy link
Author

This PR doesn't mandate how developers or vendors should approach configuration management in Backdrop. It only provides an option to store configuration in a database instead of in flat files. Developers, regardless of their savvy, can still use the default ConfigFileStorage to store configuration in files.

Nor does this require that database configuration be stored in the main database. Defining a second database connection $databases['config'] = array(...) and setting the configuration path to db:/config/active_config would preserve the separation of content and configuration.

Nor does it require the staged configuration to live in the database. Developers can still edit the configuration files in the staging directory and then import it in to the active config in the database. Actually, this is how I expect a site using database configuration to be managed.

The implementation here is completely backwards compatible. Existing installations will continue to run with their file-based configuration, exactly as before. In fact, until a future PR updates the install process, the option to use the database for configuration will require someone to set it in their settings.php before running the install process.

There's not a lot of complexity, either. The PR amounts to one additional class and a switch to choose between ConfigFileStorage and ConfigDatabaseStorage. It's a textbook example of how OO development should be done. The only other piece is a small patch in bootstrap.inc to ensure the database is initialized in time to load the config from the database. If the PR looks like more, it's because the first nine commits are the commits from #2275. Splitting the PR into two, one to refactor and clean up the existing, one to add a new storage class, was done to make them both easier to review.

Yes, the load balanced environment argument is an edge case, and isn't the best argument to sell this. Security is a better one. Databases tend to be behind firewalls and only accessible by the webheads, which protects the configuration better than an easily misconfigured web server directive protecting one directory in an otherwise published files directory.

There is also a DX argument to be made. I generally don't need to copy down the files directory, because I can look past the missing images or broken download links. Grabbing the entire configuration and data in a single operation is easier. Which is not to say grabbing both files and database is hard, only that one operation is easier than two operations, and harder to mess up.

@jenlampton
Copy link
Member

jenlampton commented Dec 2, 2016

I'm still concerned about adding a feature that only benefits some extreme minority of sites, increases the amount of code we'd need to maintain, increases the overall complexity of the system, and decreases the visibility of how configuration files work when the feature is used. All of those things go against Backdrop's philosophy.

That concern doesn't mean we shouldn't do this, or that it's a bad idea. All it means is that this decision needs to be considered carefully and not just merged in because the code looks good.

What I'm asking for is more eyes on the problem/solution, more voices weighing in with opinions, and some more time to think about whether it's worthwhile to make this kind of change.

We haven't had any PRs like this that go directly against our principles yet, it will be a good test of our process to see how we handle the decision.

@serundeputy
Copy link
Member

serundeputy commented Dec 3, 2016

I don't think we/Backdrop should relegate high performance websites as an edge case. It seems perfectly reasonable to me that there will be sites that need load balancing and high performance techniques and it seems likely to me that those will be the ones that give Backdrop CMS the confidence boost for others to adopt.

The tipping point for me is performance. @sentaidigital can we get some metrics:

  • single install sites with config in the DB vs. config in files
  • load balanced config in DB vs. config in files

If we can demonstrate that the performance is better or on par with the current situation then I think this is a good add with no impact for people that spin up Backdrop on shared hosting. They will never know about this setting until/and if they ever need it.

I know we target small to medium size businesses and non-profits. I've worked on several non profit sites that are high traffic with load balancing and cacheing in place.

Pantheon in particular could benefit from this change, because it would make it so that if you downsynced your database, it would also include your configuration. The configuration directory isn't easily accessible on Pantheon anyway, so storing the configuration in the database might give you a multiple wins.

  • we can down sync the files dir to on Pantheon. It is an option on the dash for down syncing DB and a separate option to down sync files. So that should work fine in our current scenario.

That is my 2¢; let's see if we can get those metrics.

Here is a write up on how @quicksketch did some benchmarking: https://backdropcms.org/news/backdrop-vs-drupal-7-benchmarks-php-55-56-and-70
and the corresponding post for D8 http://www.jeffgeerling.com/blogs/jeff-geerling/benchmarking-drupal-8-php-7-vs-hhvm

@jenlampton
Copy link
Member

jenlampton commented Dec 3, 2016

I don't think we/Backdrop should relegate high performance websites as an edge case.

Why is less than 1% of sites not an edgecase? That's the exact definition of an edgecase!

It seems perfectly reasonable to me that there will be sites that need load balancing and high performance techniques and it seems likely to me that those will be the ones that give Backdrop CMS the confidence boost for others to adopt.

Yes, I agree with this. Maybe high-profile sites using backdrop will sell others on using it, but that doesn't mean everything they need to do run a Backdrop site needs to go into core.

I don't care so much about performance while the setting is ON, though seeing the metrics might be interesting. There should be a cost associated with keeping the config in the db, but we've never been able to measure it (before now).

FTR: I'm less concerned with this particular issue and more concerned with how Backdrop will handle these edge-case feature requests, and how we do the cost/benefit analysis to figure out if/when requests like this should go in.

@Gormartsen
Copy link
Member

  • Makes it easier to synchronize config changes among multiple web servers in a load balanced environment.
  • Makes it easier to capture the entire state of an environment for debugging elsewhere.

Hi @sentaidigital

I am doing a lot of CI/CD and multiple nodes systems.
I found that keeping configs in json files actually do make deployment and testing much easier.

I keep configs in github repository as a part of the project and all config changes on production brunch get deployed to multiple servers at the same time.

Good part of it is that you can rollback configuration without affecting database data.

Also you can track who and when did changes to specific config and what was a reason.

I suggest you to explore this option.
When properly adopt, it makes development for big projects more secure and stable.
And much easier.

@Gormartsen
Copy link
Member

@jenlampton - I usually works on edge-case and performance critical tasks.
I will be more than happy to do measurements and tests for this type requests and if necessary provide alternative ideas to handle it.

@sentaidigital
Copy link
Author

sentaidigital commented Dec 3, 2016

Quick datapoints:

Test run of this PR using PHP 7 and file-based config: Link run_tests.php takes 4m05s
Test run of this PR using PHP 7 and db-based config: Link run_tests.php takes 3m42s with 1 exception
Same with fixed exception: Link run_tests.php takes 3m43s

There are a lot of variables here. The server may have been under higher load during the stock tests, which were run several days ago. @Gormartsen, I'd appreciate your help coming up with more reliable numbers.

Edit: If these numbers hold, using the database speeds up the test suite by 9%.

Edit 2:
Running the tests on my own VM:

Database: Test run duration: 36 min 36 sec
Files: Test run duration: 38 min 13 sec

Database seems to be no worse than files.

@argiepiano
Copy link

Ah, an idea... the problem seems to be caused by the call to config_get_config_directory()

This will probably break contrib as well - thinking of backup_migrate.

@indigoxela
Copy link
Member

indigoxela commented Apr 22, 2024

This will probably break contrib as well - thinking of backup_migrate.

I did a quick check: yes, B&M is also affected, though in a slightly different way. 😞 It's not able to backup active config.

@quicksketch
Copy link
Member

I'd like to note that the current PR breaks almost every Bee command, if the site's config class is ConfigDatabaseStorage. And it doesn't seem like Bee does anything "dirty" with config. Eventually a bootstrap level problem?

I wasn't able to reproduce a problem with bee. Running bee in DDEV my commands all seem to work fine, even the ones directly manipulating config:

image

All of the following seem to have worked for me:

bee uli
bee state-get install_time
bee config-get system.core site_name
bee config-set system.core site_name New-Value

And I confirmed that the active config is being stored in the database by querying and seeing my new site_name key being set.

select * from config_active ORDER BY changed DESC LIMIT 1;

As for B&M, I would expect it will need to take into account the Database config as a feature improvement. As long as this doesn't break the existing file-based config storage, I don't think that should be a blocker.

@indigoxela I read over your GHA review (thank you!) and it doesn't look like there are any action items. I'm going to put this back to review. Please let me know if I missed something that needs attention.

@indigoxela
Copy link
Member

@quicksketch re Bee - could you provide some details about your setup? I can't seem to get any of the commands you listed working (only "bee st" works). I just get "Exception: The configuration directory... active... does not exist. in config_get_config_directory() (line 235 of .../core/includes/config.inc)". (With latest bee via git, with PHP 8.2)

My relevant settings:

$settings['config_active_class'] = 'ConfigDatabaseStorage';
$config_directories['active'] = NULL;
$config_directories['staging'] = '../bd-config/staging';

I tried combinations with different "active" dir settings, or commenting out. No clue, how this could work.

Is there some leftover active config dir in your install? Maybe from earlier installs, inactive now, but existing, that just makes the check in _backdrop_bootstrap_configuration() pass?

@laryn
Copy link
Contributor

laryn commented Apr 23, 2024

I have been testing locally on a copy of a live site that I've switched over to config in the database. I can confirm the issue with Bee, but everything else I've tested is working very well (changing configuration via the UI, importing/exporting configuration one-off and globally).

@laryn
Copy link
Contributor

laryn commented Apr 23, 2024

It seems as though this check is failing and sending back the default when bee is used:

function config_get_config_storage($type = 'active') {
  $class = settings_get('config_' . $type . '_class', 'ConfigFileStorage');

In settings_get() the global $settings variable is coming back empty for some reason.

@laryn
Copy link
Contributor

laryn commented Apr 23, 2024

When running bee cc all It looks like backdrop_settings_initialize() first gets called from bee_initialize_console() and the values of $settings are set correctly, including config_active_class.

Then bee_process_command() calls bee_bootstrap() and that in turn calls backdrop_bootstrap() and we end up in backdrop_settings_initialize() again. In the spot where settings gets initialized to an empty array, the previously loaded values are zero'ed out, and this time around they are not reset, because this condition is not met:
if (isset($_SERVER['BACKDROP_SETTINGS'])) {

I don't know why that setting is no longer set!

@quicksketch
Copy link
Member

quicksketch commented Apr 24, 2024

Thanks @indigoxela, you guessed correctly. I had left-over files that made it so the config files were being found even though they weren't the ones actually being used.

I remember we had this problem with bee previously, here's the issue: backdrop-contrib/bee#205 (comment)

bee worked-around the problem without patching core by duplicating $settings temporarily and then restoring it.

Now seems like a good time to at least try fixing this in core, I think it might be as simple as not wiping out the $settings array in backdrop_settings_initialize(); that would allow it to be called without losing the existing settings values. I'll update the PR with this approach and see what happens.

@quicksketch
Copy link
Member

By removing a single line that reset the global $settings, it has both fixed bee and still has 100% test coverage passing.

@laryn
Copy link
Contributor

laryn commented Apr 27, 2024

Do we have any core committers who haven’t been involved in this issue in some way? It’s looking great to me…

@quicksketch
Copy link
Member

quicksketch commented Apr 28, 2024

Looks like I didn't fully fix MenuTrailTestCase earlier and it's still intermittently failing.

@quicksketch
Copy link
Member

I put in a new fix for the tests, the failure in MenuTrailTestCase is a weird one where a test can fail if the previous test has unwritten cache entries that need to be saved into the theme registry. The cache tries to write in-between test cases when there isn't an active config directory. I fixed it by adding a backdrop_static_reset() into BackdropWebTestCase::tearDown(), which makes the theme registry write its pending cache entries. This seems to have fully fixed the problem and I got 3 full test runs with 100% passing.

@laryn
Copy link
Contributor

laryn commented Apr 29, 2024

7.5 years, 140 comments, 4 pull requests, many hours of puzzlement later, merged into 1.x as a new feature to be included in 1.28.x!

Big thanks to @sentaidigital and @jlfranklin for starting things off, and to @quicksketch and @hosef for the heavy lifting of getting tests working at the end. With reviews and assists in the middle from @herbdool, @laryn, @indigoxela, @mikemccaffrey, @jenlampton, @serundeputy, @Gormartsen, @klonos, @alexfinnarn, @kiamlaluno, @argiepiano.

A great team effort! 🎉🎉🎉

Reopening for the needed change record.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment