New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd: Implement storage import/export #5532
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this implementation!
While reviewing I thought of something that might make more practical sense... basically a transfer-storage
utility that does both export and import at the same time, in lock-step, to avoid blowing up memory and also for simplicity's sake.
We'd just need to figure out how to configure both storage modules.
caddy.go
Outdated
// storage module and returns the underlying assets. | ||
func ExportStorage(cfg *Config) (files []struct{ | ||
Name string | ||
Body []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading the entire storage into memory is probably not ideal -- it will blow up for large deployments (where a feature like this is most useful).
Maybe exporting should write directly to an archive.
Although, we're getting into separate tooling territory.
What if, instead, we had a transfer storage
utility that simply moved the contents of one storage to another storage? We could do it kv-pair by kv-pair, much more efficiently, directly, without any need to read everything into memory or have separate import/export ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if, instead, we had a transfer storage utility that simply moved the contents of one storage to another storage? We could do it kv-pair by kv-pair, much more efficiently, directly, without any need to read everything into memory or have separate import/export ops.
Yeah that's a good point, like just Load() from old -> Store() in new -> Delete() from old per each key at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Errrr, but maybe make Delete() opt-in, and maybe that should only be done after the migration is 100% totally completed successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transfer alone might not be enough in situations where you want to move from one cluster to another where the storage is completely isolated in the old vs new. Exporting is also a good way to perform backups.
I think adding transfer is a good idea, but I don't think it should be instead of export/import, I think both should exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think backups are in the scope of the web server. I'd like to leave that up to backup utilities.
I also would like to hear of actual use cases that require an export but not an import -- AFAIK the need is to be able to switch storage providers, migrate certs from one place to another. A plain export can be done with the storage system itself (i.e. copy the files from disk, or run a SQL query) -- importing so that Caddy can use it is harder, and it makes sense to be able to import from any data source.
The more I think about this the more I'm convinced that we need a transfer utility -- effectively an "import", but of course supporting any storage module as a source -- not an exporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I said, if the storage is isolated in such a way that Caddy cannot contact both locations (e.g. one is running in AWS and they're migrating to Azure, storage is isolated to the VPC), then transfers cannot work. I don't think we can presume that transfers are always possible. That would be a very unfortunate limitation.
And for backups, that assumes the users know how to perform backups using their storage mechanism, and that they might even care to backup anything other than Caddy's data in that storage. For example for Redis, I might be using it as a cache otherwise, so I don't care to backup anything else.
Also re memory usage of loading everything in storage, I don't think that's a relevant concern. The CLI script is what will be keeping it in memory. Each LE cert+key+json is roughly 6k in size (with default key type etc), so 1000 certs (which is quite large) is ~6MB, and a huge deployment of like 50,000 certs would be like ~300MB; on a deployment that large, they'd surely be using a machine with plenty of RAM, hopefully over-provisioned.
Either way, writing straight to a mutable archive file is a good idea but I wouldn't consider that absolutely necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also re memory usage of loading everything in storage, I don't think that's a relevant concern.
fwiw this was my first impression as well -- that everything in storage is relatively small both individually and overall, given that the storage is strictly an interface for items critical to operation rather than anything served by itself ("must not be a cache..." note in the docs). I'm definitely not experienced with any large deployments though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A plain export can be done with the storage system itself (i.e. copy the files from disk, or run a SQL query) -- importing so that Caddy can use it is harder, and it makes sense to be able to import from any data source.
I agree that exporting already has some clear ways to do it. The framing that makes sense to me is that the commands as originally described seem useful to help facilitate transfers/backups (either manually or scripted), but actually doing those would indeed be out of scope. So it's a question of whether it's too little to justify adding rather than too much.
The more I think about this the more I'm convinced that we need a transfer utility -- effectively an "import", but of course supporting any storage module as a source -- not an exporter.
I don't follow quite as much with this. I don't want to muddle things too much, I guess my thought for now is any potential memory issue seems orthogonal to whatever CLI semantics are desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's just bad form though -- even if the data happens to only be a few MB in most cases, we do know there are clusters out there with 100,000+ certs. It's so easy to stream the contents file-by-file, so we should probably do that. If someone comes to us complaining that there's a lot of memory use when doing an import/export (because, memory is a shared resource), then I would have no good excuse for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is why I'm saying it makes sense to have both import/export and transfer. It's up to the user to pick which is best for them. But only having transfer and not import/export isn't acceptable IMO.
If we aren't going to make this a transfer utility, I'll revisit this after the 2.7 beta 1 release, which is my current priority. I'm still not sure exporting data from storage systems is within the scope of the web server. |
But only Caddy can provide a unified interface to export/import for any storage implementation, because of the interfaces. I think it's definitely in scope, when But yeah, no rush on this, it's not a blocker for 2.7.0. |
An export without an import is just a backup, and that's not Caddy's job. If you want to back up your certs, run Now, an import is definitely useful. But import without source data is useless, so really what we need is an import feature that can get the contents from a configured storage, which so happens to basically be a kind of export. So, the user can still do what they need (export now, import later), it's just that they don't use Caddy until that import step. I'm still willing to discuss but I'm not yet convinced that we need 3 separate ops here. |
I pushed a commit that gets rid of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting there!
I still need to decide whether to have this as two commands (import/export) or just one (transfer).
@Phrynobatrachus Thanks so much for continuing to work on this and for your patience. It's a high-quality contribution -- I am just dragging my feet on deciding what to do here. We'll get it in sooner or later though 😅 |
Take your time! I'm here for the practice anyways, it's all good. |
Some notes from testing against storage implementations with the transfer invocation suggested by Francis:
(where old is file_system, new is alt storage) postgres-storage
if recursive {
return nil, fmt.Errorf("recursive not supported")
} caddy-tlsredis
return certmagic.KeyInfo{
Key: key,
Modified: data.Modified,
Size: int64(len(data.Value)),
IsTerminal: false,
}, nil changed to certmagic-s3
https://github.com/ss098/certmagic-s3/blob/627ff2e5b5a473e29cc63f15e37cbd6b1b22eeab/s3.go#L244-L249 return certmagic.KeyInfo{
Key: object.Key,
Modified: object.LastModified,
Size: object.Size,
IsTerminal: strings.HasSuffix(object.Key, "/"),
}, err should be |
Ooh very interesting, thanks for trying that! I don't know the answer re |
@Phrynobatrachus Excellent, thanks for the field trials.
Only when the stat'ed key is a "file" (i.e. not a "directory"). In other words, only when the key doesn't contain other keys within/under it. If that makes sense.
Probably true! Good catch.
@yroc92 Can I help with discussing how to support After seeing your trial command, I'm more inclined to be OK with the current import/export situation; however, do you think we will get requests for the export format to be changed/customizable? (I Hope not, heh.) That's my main concern if we go forward with this... is people will try to use |
If export always essentially turns whatever storage into the layout it would have if it was the filesystem module, then I think that's pretty natural. It's not like we're introducing a new format of somekind, it's already how storage already exists, just put in a tarball. IMO the only downside of import/export is that it's loaded in memory before writing out. But I think that's fine for a large majority of users. If a truly huge storage needs to be migrated then we can introduce |
Yeah, if |
I think that depends on how the storage mechanism is implemented. For example, file systems don't necessarily require a trailing |
This was only a problem with my original version where I was unnecessarily buffering the data to export right? Latest commit using tar again for IO looks like it works, and shouldn't have that issue unless I'm mistaken (which is very possible 😛) |
Yeah, and since our current storage APIs don't support streaming, the best we can do is write each file in turn. Maybe in the future we'll be able to stream individual files too. Thanks for the streaming enhancement! That's going to be much better. I think we can merge it in this week. I need to do a final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, this is getting close!
See inline comments, plus:
- Maybe we should move the relevant code (other than the
RegisterCommand()
) into a file calledcmd/storage.go
since there's quite a few functions that are needed to make it work, we can keep them together. - We should add a note to the CLI docs stating that this is experimental and subject to change, maybe:
EXPERIMENTAL: This feature is subject to change or removal.
-- I don't think we'll remove it, but I want to set clear expectations until we iron out any issues later. :)
Thanks for working on this!
ba4c316
to
ab27409
Compare
These commands use the certmagic.Storage interface. In particular, storage implementations should ensure that their List() functions correctly enumerate all keys when called with an empty prefix and recursive == true. Also, Stat() calls on keys holding values instead of nested keys are expected to set KeyInfo.IsTerminal = true.
ab27409
to
e399b81
Compare
Thanks for the reviews. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesomesauce, let's try it out!
Thanks for the great contribution.
I'll add you as a collaborator to the repository as we'd be happy to continue collaborating with you, and this way you can make patches or enhancements in a branch and not a fork.
I feel like the CI is drunk right now, will try again later |
Ah I believe that's my bad actually, |
Thank you again!! |
It's cool to land it! I'll definitely keep an eye out for other contributions, especially anything shaking out from this. Nice working with you both. 👉 |
Btw, if you'd like to open PRs to fix the |
#5313
Thanks for the fun project!
The two functions in
caddy.go
that is? There are caddycmd wrappers separately as is.I started by shelling out
systemctl status caddy
for part of the rerun check but relying only on the--user
flag felt a little less awkward. I'm not sure what the best option is though. Besides file permissions, it seemed necessary to re-exec with the corresponding alternate user'$HOME
set, for AppDataDir() -> homeDirUnsafe(), so the user has to be specified anyways to get that.Windows handling (AppData vs HOME), etc?
Add
CAP_SETUID
/CAP_SETGID
tosetcap.sh
?fork/exec error on the user switch without
NoSetGroups: true
https://walac.github.io/golang-patch/run()
vsRun()
run() makes sense to me for exporting, maybe importing should Run() instead?