cmd: Implement storage import/export #5532

Phrynobatrachus · 2023-05-11T18:22:47Z

Thanks for the fun project!

I think I'd probably want to put that in the caddycmd package though (the cmd folder), as it's purely a CLI utility, and not really anything to do with Caddy core. (Assuming nothing new needs to be exported. If so let me know.)

The two functions in caddy.go that is? There are caddycmd wrappers separately as is.

In the case of running as a systemd service, the command would need to be run as the caddy user to correctly access the files owned by that user, if the file_system storage module is used.

I started by shelling out systemctl status caddy for part of the rerun check but relying only on the --user flag felt a little less awkward. I'm not sure what the best option is though. Besides file permissions, it seemed necessary to re-exec with the corresponding alternate user' $HOME set, for AppDataDir() -> homeDirUnsafe(), so the user has to be specified anyways to get that.

Windows handling (AppData vs HOME), etc?
Add CAP_SETUID / CAP_SETGID to setcap.sh?
fork/exec error on the user switch without NoSetGroups: true https://walac.github.io/golang-patch/

run() vs Run()

// This is a low-level function; most callers
// will want to use Run instead, which also
// updates the config's raw state.

run() makes sense to me for exporting, maybe importing should Run() instead?

CLAassistant · 2023-05-11T18:22:52Z

All committers have signed the CLA.

cmd/commandfuncs.go

cmd/commands.go

cmd/commandfuncs.go

mholt

Thanks for this implementation!

While reviewing I thought of something that might make more practical sense... basically a transfer-storage utility that does both export and import at the same time, in lock-step, to avoid blowing up memory and also for simplicity's sake.

We'd just need to figure out how to configure both storage modules.

caddy.go

mholt · 2023-05-11T19:45:29Z

caddy.go

+// storage module and returns the underlying assets. 
+func ExportStorage(cfg *Config) (files []struct{
+	Name string
+	Body []byte


Loading the entire storage into memory is probably not ideal -- it will blow up for large deployments (where a feature like this is most useful).

Maybe exporting should write directly to an archive.

Although, we're getting into separate tooling territory.

What if, instead, we had a transfer storage utility that simply moved the contents of one storage to another storage? We could do it kv-pair by kv-pair, much more efficiently, directly, without any need to read everything into memory or have separate import/export ops.

What if, instead, we had a transfer storage utility that simply moved the contents of one storage to another storage? We could do it kv-pair by kv-pair, much more efficiently, directly, without any need to read everything into memory or have separate import/export ops.

Yeah that's a good point, like just Load() from old -> Store() in new -> Delete() from old per each key at a time.

Yes! Errrr, but maybe make Delete() opt-in, and maybe that should only be done after the migration is 100% totally completed successfully.

Transfer alone might not be enough in situations where you want to move from one cluster to another where the storage is completely isolated in the old vs new. Exporting is also a good way to perform backups.

I think adding transfer is a good idea, but I don't think it should be instead of export/import, I think both should exist.

I don't think backups are in the scope of the web server. I'd like to leave that up to backup utilities.

I also would like to hear of actual use cases that require an export but not an import -- AFAIK the need is to be able to switch storage providers, migrate certs from one place to another. A plain export can be done with the storage system itself (i.e. copy the files from disk, or run a SQL query) -- importing so that Caddy can use it is harder, and it makes sense to be able to import from any data source.

The more I think about this the more I'm convinced that we need a transfer utility -- effectively an "import", but of course supporting any storage module as a source -- not an exporter.

Like I said, if the storage is isolated in such a way that Caddy cannot contact both locations (e.g. one is running in AWS and they're migrating to Azure, storage is isolated to the VPC), then transfers cannot work. I don't think we can presume that transfers are always possible. That would be a very unfortunate limitation.

And for backups, that assumes the users know how to perform backups using their storage mechanism, and that they might even care to backup anything other than Caddy's data in that storage. For example for Redis, I might be using it as a cache otherwise, so I don't care to backup anything else.

Also re memory usage of loading everything in storage, I don't think that's a relevant concern. The CLI script is what will be keeping it in memory. Each LE cert+key+json is roughly 6k in size (with default key type etc), so 1000 certs (which is quite large) is ~6MB, and a huge deployment of like 50,000 certs would be like ~300MB; on a deployment that large, they'd surely be using a machine with plenty of RAM, hopefully over-provisioned.

Either way, writing straight to a mutable archive file is a good idea but I wouldn't consider that absolutely necessary.

Also re memory usage of loading everything in storage, I don't think that's a relevant concern.

fwiw this was my first impression as well -- that everything in storage is relatively small both individually and overall, given that the storage is strictly an interface for items critical to operation rather than anything served by itself ("must not be a cache..." note in the docs). I'm definitely not experienced with any large deployments though.

A plain export can be done with the storage system itself (i.e. copy the files from disk, or run a SQL query) -- importing so that Caddy can use it is harder, and it makes sense to be able to import from any data source.

I agree that exporting already has some clear ways to do it. The framing that makes sense to me is that the commands as originally described seem useful to help facilitate transfers/backups (either manually or scripted), but actually doing those would indeed be out of scope. So it's a question of whether it's too little to justify adding rather than too much.

The more I think about this the more I'm convinced that we need a transfer utility -- effectively an "import", but of course supporting any storage module as a source -- not an exporter.

I don't follow quite as much with this. I don't want to muddle things too much, I guess my thought for now is any potential memory issue seems orthogonal to whatever CLI semantics are desired.

I feel like it's just bad form though -- even if the data happens to only be a few MB in most cases, we do know there are clusters out there with 100,000+ certs. It's so easy to stream the contents file-by-file, so we should probably do that. If someone comes to us complaining that there's a lot of memory use when doing an import/export (because, memory is a shared resource), then I would have no good excuse for that.

Which is why I'm saying it makes sense to have both import/export and transfer. It's up to the user to pick which is best for them. But only having transfer and not import/export isn't acceptable IMO.

mholt · 2023-05-13T12:43:09Z

If we aren't going to make this a transfer utility, I'll revisit this after the 2.7 beta 1 release, which is my current priority. I'm still not sure exporting data from storage systems is within the scope of the web server.

francislavoie · 2023-05-13T17:19:37Z

But only Caddy can provide a unified interface to export/import for any storage implementation, because of the interfaces. I think it's definitely in scope, when storage is a top-level configuration option in Caddy.

But yeah, no rush on this, it's not a blocker for 2.7.0.

mholt · 2023-05-13T20:11:16Z

An export without an import is just a backup, and that's not Caddy's job. If you want to back up your certs, run rsync on your Caddy data dir, or a SQL dump of your storage DB, or do the equivalent for whatever storage medium you're using. Then you can restore later using that same tooling, no need for Caddy to reinvent the wheel.

Now, an import is definitely useful. But import without source data is useless, so really what we need is an import feature that can get the contents from a configured storage, which so happens to basically be a kind of export.

So, the user can still do what they need (export now, import later), it's just that they don't use Caddy until that import step.

I'm still willing to discuss but I'm not yet convinced that we need 3 separate ops here.

Phrynobatrachus · 2023-05-21T19:34:55Z

I pushed a commit that gets rid of the caddy.go import/export functions and loads the storage modules solely from the caddycmd package. Otherwise left the CLI setup the same for the moment. Hopefully a little closer though!

cmd/commandfuncs.go

mholt

Getting there!

I still need to decide whether to have this as two commands (import/export) or just one (transfer).

cmd/commandfuncs.go

mholt · 2023-05-25T18:57:08Z

@Phrynobatrachus Thanks so much for continuing to work on this and for your patience. It's a high-quality contribution -- I am just dragging my feet on deciding what to do here.

We'll get it in sooner or later though 😅

Phrynobatrachus · 2023-05-26T00:54:55Z

Take your time! I'm here for the practice anyways, it's all good.

Phrynobatrachus · 2023-05-28T00:28:27Z

Some notes from testing against storage implementations with the transfer invocation suggested by Francis:

caddy storage export -c Caddyfile.old -o- | caddy storage import -c Caddyfile.new -i-

(where old is file_system, new is alt storage)

postgres-storage

doesn't support recursive parameter in List(), which this PR uses

https://github.com/yroc92/postgres-storage/blob/276797aefe401b738781692d278a158c53b99208/storage.go#L317C2-L319

      if recursive {
              return nil, fmt.Errorf("recursive not supported")
      }

caddy-tlsredis

works after a patch to Stat()

https://github.com/gamalan/caddy-tlsredis/blob/66e1c61412decebb977790af25f7a6ce5c9b5878/storageredis.go#L488-L493

 return certmagic.KeyInfo{
     Key:        key,
     Modified:   data.Modified,
     Size:       int64(len(data.Value)),
     IsTerminal: false,   
}, nil

changed to IsTerminal: true, appropriate for all cases?

certmagic-s3

works after a patch to Stat()

https://github.com/ss098/certmagic-s3/blob/627ff2e5b5a473e29cc63f15e37cbd6b1b22eeab/s3.go#L244-L249

        return certmagic.KeyInfo{
                Key:        object.Key,
                Modified:   object.LastModified,
                Size:       object.Size,
                IsTerminal: strings.HasSuffix(object.Key, "/"),
        }, err

should be IsTerminal: !strings.HasSuffix(object.Key, "/")

francislavoie · 2023-05-28T00:59:28Z

Ooh very interesting, thanks for trying that!

I don't know the answer re IsTerminal, Matt will need to answer that.

mholt · 2023-05-28T05:06:53Z

@Phrynobatrachus Excellent, thanks for the field trials.

changed to IsTerminal: true, appropriate for all cases?

Only when the stat'ed key is a "file" (i.e. not a "directory"). In other words, only when the key doesn't contain other keys within/under it. If that makes sense.

should be IsTerminal: !strings.HasSuffix(object.Key, "/")

Probably true! Good catch.

doesn't support recursive parameter in List(), which this PR uses

@yroc92 Can I help with discussing how to support recursive=true on postgres-storage?

After seeing your trial command, I'm more inclined to be OK with the current import/export situation; however, do you think we will get requests for the export format to be changed/customizable? (I Hope not, heh.) That's my main concern if we go forward with this... is people will try to use export, especially, for something it's not designed for.

francislavoie · 2023-05-28T06:29:54Z

If export always essentially turns whatever storage into the layout it would have if it was the filesystem module, then I think that's pretty natural. It's not like we're introducing a new format of somekind, it's already how storage already exists, just put in a tarball.

IMO the only downside of import/export is that it's loaded in memory before writing out. But I think that's fine for a large majority of users. If a truly huge storage needs to be migrated then we can introduce transfer. But IMO import+export is the minimum viable product here.

Phrynobatrachus · 2023-05-28T17:56:47Z

Only when the stat'ed key is a "file" (i.e. not a "directory"). In other words, only when the key doesn't contain other keys within/under it. If that makes sense.

Yeah, if !strings.HasSuffix(object.Key, "/") is a reasonable expression of that for certmagic-s3, caddy-tlsredis can probably just do the same.

mholt · 2023-05-28T19:02:09Z

Yeah, if !strings.HasSuffix(object.Key, "/") is a reasonable expression of that for certmagic-s3, caddy-tlsredis can probably just do the same.

I think that depends on how the storage mechanism is implemented. For example, file systems don't necessarily require a trailing / to indicate a directory -- whether it's a directory is stored as metadata.

Phrynobatrachus · 2023-05-29T23:03:40Z

IMO the only downside of import/export is that it's loaded in memory before writing out.

This was only a problem with my original version where I was unnecessarily buffering the data to export right? Latest commit using tar again for IO looks like it works, and shouldn't have that issue unless I'm mistaken (which is very possible 😛)

mholt · 2023-05-30T17:03:30Z

Yeah, and since our current storage APIs don't support streaming, the best we can do is write each file in turn. Maybe in the future we'll be able to stream individual files too.

Thanks for the streaming enhancement! That's going to be much better.

I think we can merge it in this week. I need to do a final review.

mholt

Awesome, this is getting close!

See inline comments, plus:

Maybe we should move the relevant code (other than the RegisterCommand()) into a file called cmd/storage.go since there's quite a few functions that are needed to make it work, we can keep them together.
We should add a note to the CLI docs stating that this is experimental and subject to change, maybe: EXPERIMENTAL: This feature is subject to change or removal. -- I don't think we'll remove it, but I want to set clear expectations until we iron out any issues later. :)

Thanks for working on this!

cmd/commandfuncs.go

cmd/commands.go

These commands use the certmagic.Storage interface. In particular, storage implementations should ensure that their List() functions correctly enumerate all keys when called with an empty prefix and recursive == true. Also, Stat() calls on keys holding values instead of nested keys are expected to set KeyInfo.IsTerminal = true.

Phrynobatrachus · 2023-06-02T04:23:20Z

Thanks for the reviews. :)

mholt

Awesomesauce, let's try it out!

Thanks for the great contribution.

I'll add you as a collaborator to the repository as we'd be happy to continue collaborating with you, and this way you can make patches or enhancements in a branch and not a fork.

mholt · 2023-06-02T16:40:15Z

I feel like the CI is drunk right now, will try again later

Phrynobatrachus · 2023-06-02T17:44:17Z

Ah I believe that's my bad actually, errors.Join() is Go 1.20. Lookin at it.

mholt · 2023-06-02T19:04:56Z

Thank you again!!

Phrynobatrachus · 2023-06-02T19:16:55Z

It's cool to land it! I'll definitely keep an eye out for other contributions, especially anything shaking out from this. Nice working with you both. 👉

francislavoie · 2023-06-02T19:20:56Z

Btw, if you'd like to open PRs to fix the IsTerminal thing in some of the storage plugins, I'm sure that would be appreciated

Phrynobatrachus commented May 11, 2023

View reviewed changes

cmd/commandfuncs.go Outdated Show resolved Hide resolved

cmd/commandfuncs.go Outdated Show resolved Hide resolved

cmd/commands.go Outdated Show resolved Hide resolved

cmd/commandfuncs.go Outdated Show resolved Hide resolved

mholt added the feature ⚙️ New feature or request label May 11, 2023

mholt added this to the v2.7.0 milestone May 11, 2023

mholt requested changes May 11, 2023

View reviewed changes

mholt modified the milestones: v2.7.0, 2.x May 11, 2023

Phrynobatrachus commented May 24, 2023

View reviewed changes

cmd/commandfuncs.go Outdated Show resolved Hide resolved

mholt requested changes May 24, 2023

View reviewed changes

cmd/commandfuncs.go Outdated Show resolved Hide resolved

cmd/commandfuncs.go Outdated Show resolved Hide resolved

mholt added the under review 🧐 Review is pending before merging label May 30, 2023

mholt requested changes Jun 1, 2023

View reviewed changes

cmd/commandfuncs.go Outdated Show resolved Hide resolved

cmd/commands.go Outdated Show resolved Hide resolved

cmd/commands.go Outdated Show resolved Hide resolved

Phrynobatrachus force-pushed the import-export branch from ba4c316 to ab27409 Compare June 2, 2023 04:01

Phrynobatrachus force-pushed the import-export branch from ab27409 to e399b81 Compare June 2, 2023 04:06

mholt approved these changes Jun 2, 2023

View reviewed changes

mholt removed the under review 🧐 Review is pending before merging label Jun 2, 2023

mholt modified the milestones: 2.x, v2.7.0 Jun 2, 2023

remove errors.Join

6681088

mholt merged commit 078f130 into caddyserver:master Jun 2, 2023
23 checks passed

mholt mentioned this pull request Oct 25, 2023

Storage export/import CLI commands #5313

Closed

cmd: Implement storage import/export #5532

cmd: Implement storage import/export #5532

Conversation

Phrynobatrachus commented May 11, 2023 • edited

CLAassistant commented May 11, 2023 • edited

mholt left a comment

Choose a reason for hiding this comment

mholt May 11, 2023

Choose a reason for hiding this comment

Phrynobatrachus May 11, 2023

Choose a reason for hiding this comment

mholt May 11, 2023

Choose a reason for hiding this comment

francislavoie May 11, 2023 • edited

Choose a reason for hiding this comment

mholt May 11, 2023

Choose a reason for hiding this comment

francislavoie May 11, 2023 • edited

Choose a reason for hiding this comment

Phrynobatrachus May 12, 2023 • edited

Choose a reason for hiding this comment

Phrynobatrachus May 12, 2023 • edited

Choose a reason for hiding this comment

mholt May 12, 2023

Choose a reason for hiding this comment

francislavoie May 12, 2023 • edited

Choose a reason for hiding this comment

mholt commented May 13, 2023

francislavoie commented May 13, 2023

mholt commented May 13, 2023 • edited

Phrynobatrachus commented May 21, 2023

mholt left a comment

Choose a reason for hiding this comment

mholt commented May 25, 2023

Phrynobatrachus commented May 26, 2023

Phrynobatrachus commented May 28, 2023 • edited

postgres-storage

caddy-tlsredis

certmagic-s3

francislavoie commented May 28, 2023

mholt commented May 28, 2023 • edited

francislavoie commented May 28, 2023

Phrynobatrachus commented May 28, 2023

mholt commented May 28, 2023

Phrynobatrachus commented May 29, 2023

mholt commented May 30, 2023

mholt left a comment

Choose a reason for hiding this comment

Phrynobatrachus commented Jun 2, 2023

mholt left a comment

Choose a reason for hiding this comment

mholt commented Jun 2, 2023

Phrynobatrachus commented Jun 2, 2023

mholt commented Jun 2, 2023

Phrynobatrachus commented Jun 2, 2023

francislavoie commented Jun 2, 2023

Phrynobatrachus commented May 11, 2023 •

edited

CLAassistant commented May 11, 2023 •

edited

francislavoie May 11, 2023 •

edited

francislavoie May 11, 2023 •

edited

Phrynobatrachus May 12, 2023 •

edited

Phrynobatrachus May 12, 2023 •

edited

francislavoie May 12, 2023 •

edited

mholt commented May 13, 2023 •

edited

Phrynobatrachus commented May 28, 2023 •

edited

mholt commented May 28, 2023 •

edited