Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: Implement storage import/export #5532

Merged
merged 2 commits into from Jun 2, 2023

Conversation

Phrynobatrachus
Copy link
Collaborator

@Phrynobatrachus Phrynobatrachus commented May 11, 2023

#5313

Thanks for the fun project!

I think I'd probably want to put that in the caddycmd package though (the cmd folder), as it's purely a CLI utility, and not really anything to do with Caddy core. (Assuming nothing new needs to be exported. If so let me know.)

The two functions in caddy.go that is? There are caddycmd wrappers separately as is.

In the case of running as a systemd service, the command would need to be run as the caddy user to correctly access the files owned by that user, if the file_system storage module is used.

I started by shelling out systemctl status caddy for part of the rerun check but relying only on the --user flag felt a little less awkward. I'm not sure what the best option is though. Besides file permissions, it seemed necessary to re-exec with the corresponding alternate user' $HOME set, for AppDataDir() -> homeDirUnsafe(), so the user has to be specified anyways to get that.

run() vs Run()

// This is a low-level function; most callers
// will want to use Run instead, which also
// updates the config's raw state.

run() makes sense to me for exporting, maybe importing should Run() instead?

@CLAassistant
Copy link

CLAassistant commented May 11, 2023

CLA assistant check
All committers have signed the CLA.

cmd/commandfuncs.go Outdated Show resolved Hide resolved
cmd/commandfuncs.go Outdated Show resolved Hide resolved
cmd/commands.go Outdated Show resolved Hide resolved
cmd/commandfuncs.go Outdated Show resolved Hide resolved
@mholt mholt added the feature ⚙️ New feature or request label May 11, 2023
@mholt mholt added this to the v2.7.0 milestone May 11, 2023
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this implementation!

While reviewing I thought of something that might make more practical sense... basically a transfer-storage utility that does both export and import at the same time, in lock-step, to avoid blowing up memory and also for simplicity's sake.

We'd just need to figure out how to configure both storage modules.

caddy.go Outdated Show resolved Hide resolved
caddy.go Outdated Show resolved Hide resolved
caddy.go Outdated
// storage module and returns the underlying assets.
func ExportStorage(cfg *Config) (files []struct{
Name string
Body []byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading the entire storage into memory is probably not ideal -- it will blow up for large deployments (where a feature like this is most useful).

Maybe exporting should write directly to an archive.

Although, we're getting into separate tooling territory.

What if, instead, we had a transfer storage utility that simply moved the contents of one storage to another storage? We could do it kv-pair by kv-pair, much more efficiently, directly, without any need to read everything into memory or have separate import/export ops.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if, instead, we had a transfer storage utility that simply moved the contents of one storage to another storage? We could do it kv-pair by kv-pair, much more efficiently, directly, without any need to read everything into memory or have separate import/export ops.

Yeah that's a good point, like just Load() from old -> Store() in new -> Delete() from old per each key at a time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Errrr, but maybe make Delete() opt-in, and maybe that should only be done after the migration is 100% totally completed successfully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transfer alone might not be enough in situations where you want to move from one cluster to another where the storage is completely isolated in the old vs new. Exporting is also a good way to perform backups.

I think adding transfer is a good idea, but I don't think it should be instead of export/import, I think both should exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think backups are in the scope of the web server. I'd like to leave that up to backup utilities.

I also would like to hear of actual use cases that require an export but not an import -- AFAIK the need is to be able to switch storage providers, migrate certs from one place to another. A plain export can be done with the storage system itself (i.e. copy the files from disk, or run a SQL query) -- importing so that Caddy can use it is harder, and it makes sense to be able to import from any data source.

The more I think about this the more I'm convinced that we need a transfer utility -- effectively an "import", but of course supporting any storage module as a source -- not an exporter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said, if the storage is isolated in such a way that Caddy cannot contact both locations (e.g. one is running in AWS and they're migrating to Azure, storage is isolated to the VPC), then transfers cannot work. I don't think we can presume that transfers are always possible. That would be a very unfortunate limitation.

And for backups, that assumes the users know how to perform backups using their storage mechanism, and that they might even care to backup anything other than Caddy's data in that storage. For example for Redis, I might be using it as a cache otherwise, so I don't care to backup anything else.

Also re memory usage of loading everything in storage, I don't think that's a relevant concern. The CLI script is what will be keeping it in memory. Each LE cert+key+json is roughly 6k in size (with default key type etc), so 1000 certs (which is quite large) is ~6MB, and a huge deployment of like 50,000 certs would be like ~300MB; on a deployment that large, they'd surely be using a machine with plenty of RAM, hopefully over-provisioned.

Either way, writing straight to a mutable archive file is a good idea but I wouldn't consider that absolutely necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also re memory usage of loading everything in storage, I don't think that's a relevant concern.

fwiw this was my first impression as well -- that everything in storage is relatively small both individually and overall, given that the storage is strictly an interface for items critical to operation rather than anything served by itself ("must not be a cache..." note in the docs). I'm definitely not experienced with any large deployments though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A plain export can be done with the storage system itself (i.e. copy the files from disk, or run a SQL query) -- importing so that Caddy can use it is harder, and it makes sense to be able to import from any data source.

I agree that exporting already has some clear ways to do it. The framing that makes sense to me is that the commands as originally described seem useful to help facilitate transfers/backups (either manually or scripted), but actually doing those would indeed be out of scope. So it's a question of whether it's too little to justify adding rather than too much.

The more I think about this the more I'm convinced that we need a transfer utility -- effectively an "import", but of course supporting any storage module as a source -- not an exporter.

I don't follow quite as much with this. I don't want to muddle things too much, I guess my thought for now is any potential memory issue seems orthogonal to whatever CLI semantics are desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's just bad form though -- even if the data happens to only be a few MB in most cases, we do know there are clusters out there with 100,000+ certs. It's so easy to stream the contents file-by-file, so we should probably do that. If someone comes to us complaining that there's a lot of memory use when doing an import/export (because, memory is a shared resource), then I would have no good excuse for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is why I'm saying it makes sense to have both import/export and transfer. It's up to the user to pick which is best for them. But only having transfer and not import/export isn't acceptable IMO.

@mholt mholt modified the milestones: v2.7.0, 2.x May 11, 2023
@mholt
Copy link
Member

mholt commented May 13, 2023

If we aren't going to make this a transfer utility, I'll revisit this after the 2.7 beta 1 release, which is my current priority. I'm still not sure exporting data from storage systems is within the scope of the web server.

@francislavoie
Copy link
Member

But only Caddy can provide a unified interface to export/import for any storage implementation, because of the interfaces. I think it's definitely in scope, when storage is a top-level configuration option in Caddy.

But yeah, no rush on this, it's not a blocker for 2.7.0.

@mholt
Copy link
Member

mholt commented May 13, 2023

An export without an import is just a backup, and that's not Caddy's job. If you want to back up your certs, run rsync on your Caddy data dir, or a SQL dump of your storage DB, or do the equivalent for whatever storage medium you're using. Then you can restore later using that same tooling, no need for Caddy to reinvent the wheel.

Now, an import is definitely useful. But import without source data is useless, so really what we need is an import feature that can get the contents from a configured storage, which so happens to basically be a kind of export.

So, the user can still do what they need (export now, import later), it's just that they don't use Caddy until that import step.

I'm still willing to discuss but I'm not yet convinced that we need 3 separate ops here.

@Phrynobatrachus
Copy link
Collaborator Author

I pushed a commit that gets rid of the caddy.go import/export functions and loads the storage modules solely from the caddycmd package. Otherwise left the CLI setup the same for the moment. Hopefully a little closer though!

cmd/commandfuncs.go Outdated Show resolved Hide resolved
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there!

I still need to decide whether to have this as two commands (import/export) or just one (transfer).

cmd/commandfuncs.go Outdated Show resolved Hide resolved
cmd/commandfuncs.go Outdated Show resolved Hide resolved
@mholt
Copy link
Member

mholt commented May 25, 2023

@Phrynobatrachus Thanks so much for continuing to work on this and for your patience. It's a high-quality contribution -- I am just dragging my feet on deciding what to do here.

We'll get it in sooner or later though 😅

@Phrynobatrachus
Copy link
Collaborator Author

Take your time! I'm here for the practice anyways, it's all good.

@Phrynobatrachus
Copy link
Collaborator Author

Phrynobatrachus commented May 28, 2023

Some notes from testing against storage implementations with the transfer invocation suggested by Francis:

caddy storage export -c Caddyfile.old -o- | caddy storage import -c Caddyfile.new -i-

(where old is file_system, new is alt storage)

postgres-storage

  • doesn't support recursive parameter in List(), which this PR uses

https://github.com/yroc92/postgres-storage/blob/276797aefe401b738781692d278a158c53b99208/storage.go#L317C2-L319

      if recursive {
              return nil, fmt.Errorf("recursive not supported")
      }

caddy-tlsredis

  • works after a patch to Stat()

https://github.com/gamalan/caddy-tlsredis/blob/66e1c61412decebb977790af25f7a6ce5c9b5878/storageredis.go#L488-L493

 return certmagic.KeyInfo{
     Key:        key,
     Modified:   data.Modified,
     Size:       int64(len(data.Value)),
     IsTerminal: false,   
}, nil

changed to IsTerminal: true, appropriate for all cases?

certmagic-s3

  • works after a patch to Stat()

https://github.com/ss098/certmagic-s3/blob/627ff2e5b5a473e29cc63f15e37cbd6b1b22eeab/s3.go#L244-L249

        return certmagic.KeyInfo{
                Key:        object.Key,
                Modified:   object.LastModified,
                Size:       object.Size,
                IsTerminal: strings.HasSuffix(object.Key, "/"),
        }, err

should be IsTerminal: !strings.HasSuffix(object.Key, "/")

@francislavoie
Copy link
Member

Ooh very interesting, thanks for trying that!

I don't know the answer re IsTerminal, Matt will need to answer that.

@mholt
Copy link
Member

mholt commented May 28, 2023

@Phrynobatrachus Excellent, thanks for the field trials.

changed to IsTerminal: true, appropriate for all cases?

Only when the stat'ed key is a "file" (i.e. not a "directory"). In other words, only when the key doesn't contain other keys within/under it. If that makes sense.

should be IsTerminal: !strings.HasSuffix(object.Key, "/")

Probably true! Good catch.

doesn't support recursive parameter in List(), which this PR uses

@yroc92 Can I help with discussing how to support recursive=true on postgres-storage?


After seeing your trial command, I'm more inclined to be OK with the current import/export situation; however, do you think we will get requests for the export format to be changed/customizable? (I Hope not, heh.) That's my main concern if we go forward with this... is people will try to use export, especially, for something it's not designed for.

@francislavoie
Copy link
Member

If export always essentially turns whatever storage into the layout it would have if it was the filesystem module, then I think that's pretty natural. It's not like we're introducing a new format of somekind, it's already how storage already exists, just put in a tarball.

IMO the only downside of import/export is that it's loaded in memory before writing out. But I think that's fine for a large majority of users. If a truly huge storage needs to be migrated then we can introduce transfer. But IMO import+export is the minimum viable product here.

@Phrynobatrachus
Copy link
Collaborator Author

Only when the stat'ed key is a "file" (i.e. not a "directory"). In other words, only when the key doesn't contain other keys within/under it. If that makes sense.

Yeah, if !strings.HasSuffix(object.Key, "/") is a reasonable expression of that for certmagic-s3, caddy-tlsredis can probably just do the same.

@mholt
Copy link
Member

mholt commented May 28, 2023

Yeah, if !strings.HasSuffix(object.Key, "/") is a reasonable expression of that for certmagic-s3, caddy-tlsredis can probably just do the same.

I think that depends on how the storage mechanism is implemented. For example, file systems don't necessarily require a trailing / to indicate a directory -- whether it's a directory is stored as metadata.

@Phrynobatrachus
Copy link
Collaborator Author

IMO the only downside of import/export is that it's loaded in memory before writing out.

This was only a problem with my original version where I was unnecessarily buffering the data to export right? Latest commit using tar again for IO looks like it works, and shouldn't have that issue unless I'm mistaken (which is very possible 😛)

@mholt
Copy link
Member

mholt commented May 30, 2023

Yeah, and since our current storage APIs don't support streaming, the best we can do is write each file in turn. Maybe in the future we'll be able to stream individual files too.

Thanks for the streaming enhancement! That's going to be much better.

I think we can merge it in this week. I need to do a final review.

@mholt mholt added the under review 🧐 Review is pending before merging label May 30, 2023
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this is getting close!

See inline comments, plus:

  • Maybe we should move the relevant code (other than the RegisterCommand()) into a file called cmd/storage.go since there's quite a few functions that are needed to make it work, we can keep them together.
  • We should add a note to the CLI docs stating that this is experimental and subject to change, maybe: EXPERIMENTAL: This feature is subject to change or removal. -- I don't think we'll remove it, but I want to set clear expectations until we iron out any issues later. :)

Thanks for working on this!

cmd/commandfuncs.go Outdated Show resolved Hide resolved
cmd/commands.go Outdated Show resolved Hide resolved
cmd/commands.go Outdated Show resolved Hide resolved
These commands use the certmagic.Storage interface. In particular,
storage implementations should ensure that their List() functions
correctly enumerate all keys when called with an empty prefix and
recursive == true. Also, Stat() calls on keys holding values instead
of nested keys are expected to set KeyInfo.IsTerminal = true.
@Phrynobatrachus
Copy link
Collaborator Author

Thanks for the reviews. :)

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesomesauce, let's try it out!

Thanks for the great contribution.

I'll add you as a collaborator to the repository as we'd be happy to continue collaborating with you, and this way you can make patches or enhancements in a branch and not a fork.

@mholt mholt removed the under review 🧐 Review is pending before merging label Jun 2, 2023
@mholt mholt modified the milestones: 2.x, v2.7.0 Jun 2, 2023
@mholt
Copy link
Member

mholt commented Jun 2, 2023

I feel like the CI is drunk right now, will try again later

@Phrynobatrachus
Copy link
Collaborator Author

Ah I believe that's my bad actually, errors.Join() is Go 1.20. Lookin at it.

@mholt mholt merged commit 078f130 into caddyserver:master Jun 2, 2023
23 checks passed
@mholt
Copy link
Member

mholt commented Jun 2, 2023

Thank you again!!

@Phrynobatrachus
Copy link
Collaborator Author

It's cool to land it! I'll definitely keep an eye out for other contributions, especially anything shaking out from this. Nice working with you both. 👉

@francislavoie
Copy link
Member

Btw, if you'd like to open PRs to fix the IsTerminal thing in some of the storage plugins, I'm sure that would be appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants