Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctl_mboxlist dump/undump miss new fields #3896

Open
elliefm opened this issue Feb 3, 2022 · 6 comments
Open

ctl_mboxlist dump/undump miss new fields #3896

elliefm opened this issue Feb 3, 2022 · 6 comments
Assignees
Labels
3.4 affects 3.4

Comments

@elliefm
Copy link
Contributor

elliefm commented Feb 3, 2022

ctl_mboxlist -d dumps mailboxes.db to a flat file, and ctl_mboxlist -u undumps such a file into a new mailboxes.db

We document these as the correct tools for doing conversions of mailboxes.db (e.g. "improved_mboxlist_sort" says it requires a dump/undump when changing its value).

However, since some version long ago (3.0 maybe? 2.5?) this tool has become stale. It does not know about all the things we store in mailboxes.db, and consequently that data is lost across a dump/undump.

There is already a fix for this for 3.4 in #3384, but 3.6 and master introduce additional new fields that this fix doesn't take into account, so it's already stale as soon as 3.6 lands. The new fields are multi-valued, so "just" extending this work to accommodate them will be quite complicated.

After some discussion at https://cyrus.topicbox.com/groups/devel/Tea9b3c8c5728d1e8-M7b68e56d5c41dafa2acbba63, the plan for 3.6 is to change the dump/undump format to use json and make sure all fields are properly dumped and undumped. Ideally we'll also preserve the ability to read the older format, if possible, so that people can dump/undump as part of an upgrade process if they wish (though they will still lose any data their older server failed to dump).

I expect to start working on this after the first 3.6 beta is released, but will intend to backport it into whichever 3.6 release comes out after it's done.

@elliefm elliefm added the 3.6 affects 3.6 label Feb 3, 2022
@elliefm elliefm self-assigned this Feb 3, 2022
@elliefm
Copy link
Contributor Author

elliefm commented Feb 23, 2022

There are recover (-r) and checkpoint (-c) implementations in here that have been "deprecated" in favour of ctl_cyrusdb since 2001, and aren't documented anywhere else. I'm gonna just rip them out while I'm in here.

For some bizarre historical reason the dump implementation is deeply interwoven with the populate-mupdate implementation? So I guess I can see why this has been neglected so long, it's a mess. Gonna separate the two actions first, so when I refactor dump, I'm not tripping over unrelated mupdate stuff.

@elliefm
Copy link
Contributor Author

elliefm commented Feb 23, 2022

@elliefm
Copy link
Contributor Author

elliefm commented Feb 25, 2022

It occurs to me that, since the dump loop is initiated by mboxlist_allmbox(), that this still can only dump records that are actually mailboxes, and misses all the other stuff that mailboxes.db stores.

I think the right thing to do will be to cyrusdb_foreach over everything in there, and implement the dump/undump as dlist<->json converters?

@elliefm
Copy link
Contributor Author

elliefm commented Feb 25, 2022

Though if the dump format has no semantics of its own and is just a direct json version of the internal dlist, then you wouldn't be able to use dump/undump to move data between versions, because no translation would occur along the way. Hmm.

@elliefm
Copy link
Contributor Author

elliefm commented Feb 27, 2022

Given that this is ctl_mboxlist, I think it should care primarily about the mboxlist, and the dump format should be semantically coherent for that specific purpose.

If we do eventually want generic "dump/restore every key and value in this database, without regard for what they mean" actions, they should probably be in cyr_dbtool instead.

@elliefm
Copy link
Contributor Author

elliefm commented Jun 3, 2022

The 3.4 version of this from #3384 has a bug: mbentry->uniqueid and mbentry->legacy_specialuse can be NULL. When printed with printf(), these print as (null), and when read back with sscanf(), they're parsed as the string "(null)".

The trivial fix is to do something like mbentry->uniqueid ? mbentry->uniqueid : "" in the printf() arguments, so that an empty string is printed rather than "(null)". But this set of fields is separated by space characters, and sscanf() discards leading space characters when parsing fields. I haven't tested/proven this, but I believe it means that when uniqueid is printed as an empty string, the undump parser will end up misinterpreting the space following it as "leading space", and then try to read the following field (mtime) into the uniqueid. The remaining fields up until legacy_specialuse are all numeric, so them being read off-by-one won't be immediately obviously wrong.

legacy_specialuse being empty wouldn't cause any additional problems, since it's the end of the line anyway.

So anyway, we need to not produce "(null)", but if we produce empty fields instead then we need to adjust the parser. I think this can be addressed by switching the uniqueid field in the sscanf format string from the simple %ms to a %m[...], where the [...] contains the set of all characters valid in a uniqueid. So that's what I'll try next -- though I'll also want to write some tests for all this! Though that means it's looking unlikely that it'll be ready for 3.4.4, and might have to wait for 3.4.5.

My WIP branch for getting this stuff ready for 3.4 is here: cyrus-imapd-3.4...elliefm:v34/3896-ctl_mboxlist-dump-undump

@elliefm elliefm added 3.4 affects 3.4 and removed 3.6 affects 3.6 labels Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.4 affects 3.4
Projects
None yet
Development

No branches or pull requests

1 participant