Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools/cephfs: add tmap_upgrade #7003

Merged
merged 2 commits into from Mar 10, 2016
Merged

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Dec 21, 2015

Because TMAP support will go away in Jewel+1,
we need a way to ensure anyone upgrading
an ancient CephFS filesystem can make sure
all their TMAPs have gone away.

Signed-off-by: John Spray john.spray@redhat.com

@jcsp jcsp added cephfs Ceph File System feature labels Dec 21, 2015
@jcsp
Copy link
Contributor Author

jcsp commented Dec 21, 2015

(testing this could be kind of interesting... dunno if any of the upgrade tests cover a version old enough to still use tmaps)

@dzafman
Copy link
Contributor

dzafman commented Dec 21, 2015

LGTM Can we get a simple unit test that forces tmap to be used, then cleans up with "tmap_upgrade" ?

}
}

return overall_r;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess overall_r can never be 0, because metadata pool does not store dirfrag object only. I think we should filter out non-dirfrag objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it super-simple I've just changed it to ignore the EINVALs that we get on non-omap/tmap objects.

@jcsp jcsp force-pushed the wip-cephfs-tmap-migrate branch 2 times, most recently from 15f09f0 to fecbc77 Compare January 4, 2016 13:33
@jcsp
Copy link
Contributor Author

jcsp commented Jan 4, 2016

I've added a test now. It's kind of awkward because the only way to create tmaps is using librados, so instead of being a normal ceph-qa-suite python test this is a C++ test in the style of the librados tests.

@jcsp
Copy link
Contributor Author

jcsp commented Jan 8, 2016

test this please

@dzafman
Copy link
Contributor

dzafman commented Jan 8, 2016

I built v0.74 and created a cluster with mds. I created a whole bunch of directories. After that I checked out hammer and built it. I just started the OSDs and didn't do anything with mds. Then I fetched your branch and built that. I started up the cluster and executed "./cephfs-data-scan tmap_upgrade metadata"

$ ./ceph mds stat
e16: 0/1/1 up, 1 up:standby, 1 damaged

I did note that a bunch of files in the filestore originally with 230 byte length (objects for empty directories with tmaps), are now empty which means the conversion from tmap to omap must have happened.

@jcsp
Copy link
Contributor Author

jcsp commented Jan 10, 2016

@dzafman hmm, did you happen to keep the MDS logs? Would be useful to know what triggered the 'damaged'

@gregsfortytwo
Copy link
Member

I guess we're blocked until @jcsp gets those logs?

@dzafman
Copy link
Contributor

dzafman commented Jan 26, 2016

@jcsp Here the log output of the thread in mds that declared damage:

After executing: ./cephfs-data-scan tmap_upgrade metadata and starting up the mds. I actually started all daemons.

2016-01-26 12:28:39.914579 7f9bc9ffb700 10 MDSIOContextBase::complete: 12C_IO_MT_Load
2016-01-26 12:28:39.914584 7f9bc9ffb700 10 mds.0.inotable: load_2 got 34 bytes
2016-01-26 12:28:39.914586 7f9bc9ffb700 10 mds.0.inotable: load_2 loaded v4000
2016-01-26 12:28:39.914718 7f9bc9ffb700 10 MDSIOContextBase::complete: 12C_IO_SM_Load
2016-01-26 12:28:39.914725 7f9bc9ffb700  4 mds.0.sessionmap _load_finish: header missing, loading legacy...
2016-01-26 12:28:39.914728 7f9bc9ffb700 10 mds.0.sessionmap load_legacy
2016-01-26 12:28:39.914748 7f9bc9ffb700  1 -- 127.0.0.1:6812/29414 --> 127.0.0.1:6804/29393 -- osd_op(mds.0.2:5 1.3270c60b mds0_sessionmap [read 0~0] snapc 0=[] ack+read+known_if_redirected+full_force e25) v7 -- ?+0 0x7f9ba80013b0 con 0x7f9bd00252f0
2016-01-26 12:28:39.915361 7f9bc9ffb700 10 MDSIOContextBase::complete: 12C_IO_MT_Load
2016-01-26 12:28:39.915366 7f9bc9ffb700 10 mds.0.snaptable: load_2 got 0 bytes
2016-01-26 12:28:39.916248 7f9bc9ffb700 -1 log_channel(cluster) log [ERR] : error decoding table object 'mds_snaptable': buffer::end_of_buffer
2016-01-26 12:28:39.916257 7f9bc9ffb700 10 mds.beacon.a set_want_state: up:replay -> down:damaged
2016-01-26 12:28:39.916259 7f9bc9ffb700 10 log_client  log_queue is 1 last_log 1 sent 0 num 1 unsent 1 sending 1
2016-01-26 12:28:39.916264 7f9bc9ffb700 10 log_client  will send 2016-01-26 12:28:39.916255 mds.0 127.0.0.1:6812/29414 1 : cluster [ERR] error decoding table object 'mds_snaptable': buffer::end_of_buffer
2016-01-26 12:28:39.916283 7f9bc9ffb700 10 monclient: _send_mon_message to mon.a at 127.0.0.1:6789/0
2016-01-26 12:28:39.916285 7f9bc9ffb700  1 -- 127.0.0.1:6812/29414 --> 127.0.0.1:6789/0 -- log(1 entries from seq 1 at 2016-01-26 12:28:39.916255) v1 -- ?+0 0x7f9ba8001d30 con 0x7f9be924fc20
2016-01-26 12:28:39.916318 7f9bc9ffb700 10 mds.beacon.a _send down:damaged seq 2
2016-01-26 12:28:39.916324 7f9bc9ffb700 10 monclient: _send_mon_message to mon.a at 127.0.0.1:6789/0
2016-01-26 12:28:39.916326 7f9bc9ffb700  1 -- 127.0.0.1:6812/29414 --> 127.0.0.1:6789/0 -- mdsbeacon(24197/a down:damaged seq 2 v9) v4 -- ?+0 0x7f9ba8002240 con 0x7f9be924fc20
2016-01-26 12:28:39.916329 7f9bc9ffb700 20 mds.beacon.a send_and_wait: awaiting 2 for up to 5s

@jcsp
Copy link
Contributor Author

jcsp commented Jan 28, 2016

OK, I think I was overestimating the safety of running TMAP2OMAP on objects. The TMAP loading code will accept anything that looks vaguely decode-able as a couple of bufferlists, and if it sees something that it thinks it could load, it will truncate the body of the object.

@jcsp
Copy link
Contributor Author

jcsp commented Jan 29, 2016

Updated: should work properly this time!

@dzafman
Copy link
Contributor

dzafman commented Jan 29, 2016

@jcsp After the tmap upgrade the MDS crashes with an assert. I've attached a log in tracker: http://tracker.ceph.com/issues/13768 .

Also, after the upgrade I found that all but 1 of objects with 230 byte length which I believe to be the tmaps are zero length as they should be except for inode 2. Looks like the code missed the root directory.

dzafman$ find dev -type f -ls| grep " 230 "
846221 8 -rw-r--r-- 1 dzafman dzafman 230 Jan 29 12:52 dev/osd1/current/1.7_head/DIR_7/2.00000000__head_96F33707__1
852106 8 -rw-r--r-- 1 dzafman dzafman 230 Jan 29 12:52 dev/osd2/current/1.7_head/DIR_7/2.00000000__head_96F33707__1
846220 8 -rw-r--r-- 1 dzafman dzafman 230 Jan 29 12:52 dev/osd0/current/1.7_head/DIR_7/2.00000000__head_96F33707__1

@jcsp
Copy link
Contributor Author

jcsp commented Feb 1, 2016

Ah, it's MDS_INO_CEPH (i.e. /.ceph), I've added that now, thanks for spotting it.

However, I don't see a connection between that and the assertion

osdc/Journaler.cc: 431: FAILED assert(last_written.write_pos >= last_written.expire_pos)

I'm suspicious that that could be from some other unrelated upgrade bug.

@dzafman do you by any chance have a "rados export" of the metadata pool from before tmap_upgrade was run so that I can debug this?

@dzafman
Copy link
Contributor

dzafman commented Feb 1, 2016

@jcsp No, I don't have a rados export, but I will generate one. I'm going to upgrade to Jewel and start the MDS to see if it crashes, if it does not, I'll create exports and then switch to your branch and try the tmap_upgrade.

@dzafman
Copy link
Contributor

dzafman commented Feb 2, 2016

@jcsp Manual testing passed

After Dumpling build (create a bunch of directories) -> firefly -> Hammer -> wip-cephfs-tmap-migrate. I ran "./cephfs-data-scan tmap_upgrade metadata" appeared to work. I stop the MDS and added an assert in do_osd_ops() for CEPH_OSD_OP_TMAPGET. It was never called when traversing the entire directory tree.

@jcsp jcsp assigned gregsfortytwo and unassigned dzafman Feb 2, 2016
@jcsp
Copy link
Contributor Author

jcsp commented Feb 2, 2016

@dzafman great, thanks for your persistence.

@gregsfortytwo
Copy link
Member

@jcsp, @dzafman Presumably we'll need some upstream docs for users on how to do the upgrade?
@ukernel, can I take from your comment that you reviewed this?

@dzafman
Copy link
Contributor

dzafman commented Feb 2, 2016

@gregsfortytwo @jcsp Is there a reason we don't do the tmap upgrade automatically since we have to restarting the MDS anyway?

@gregsfortytwo
Copy link
Member

On Tue, Feb 2, 2016 at 12:31 PM, David Zafman notifications@github.com wrote:

@gregsfortytwo @jcsp Is there a reason we don't do the tmap upgrade automatically since we have to restarting the MDS anyway?

The MDS already switches anything it sees and changes over to omap.
The tool (I presume, I didn't actually look) has to crawl the whole
pool and upgrade everything.

@jcsp
Copy link
Contributor Author

jcsp commented Feb 2, 2016

Right: the tool is so that the MDS doesn't have to traverse the entire metadata tree and do tmap upgrades. We could have built it into forward scrub and done it in the background, but this was way simpler.

@dzafman
Copy link
Contributor

dzafman commented Feb 2, 2016

@gregsfortytwo We can't remove tmap in jewel+1 unless we require an upgrade to jewel (and running the tool) before upgrade to jewel+1.

Will we we want to allow an upgrade from Hammer directly to Jewel+1?

@liewegas
Copy link
Member

liewegas commented Feb 2, 2016 via email

@gregsfortytwo
Copy link
Member

gregsfortytwo commented Feb 2, 2016 via email

gregsfortytwo added a commit that referenced this pull request Feb 3, 2016
gregsfortytwo added a commit that referenced this pull request Feb 8, 2016
@jcsp jcsp assigned jcsp and unassigned dzafman Feb 9, 2016
@jcsp
Copy link
Contributor Author

jcsp commented Feb 9, 2016

Assigned to self to make doc updates

Because TMAP support will go away in Jewel+1,
we need a way to ensure anyone upgrading
an ancient CephFS filesystem can make sure
all their TMAPs have gone away.

Signed-off-by: John Spray <john.spray@redhat.com>
@jcsp
Copy link
Contributor Author

jcsp commented Feb 29, 2016

Added docs, this is good to go. I have added a recurring note to my calendar to send out an email and make sure the release notes mention it at the point we're releasing Jewel.

@jcsp jcsp assigned gregsfortytwo and unassigned jcsp Feb 29, 2016
@gregsfortytwo
Copy link
Member

I think you can just update the PendingReleaseNotes file?

This is part of the run-up to removing all
TMAP code in the Jewel+1 cycle.

Signed-off-by: John Spray <john.spray@redhat.com>
@jcsp
Copy link
Contributor Author

jcsp commented Mar 1, 2016

@gregsfortytwo good point, forgot about that file. Udpated.

gregsfortytwo added a commit that referenced this pull request Mar 8, 2016
…into greg-fs-testing

#7003

Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
@gregsfortytwo
Copy link
Member

http://qa-proxy.ceph.com/teuthology/gregf-2016-03-07_23:12:27-fs-greg-fs-testing-3-7-safe---basic-mira/
5 failures, largely infrastructure issues and all known and consistent around this time period.

gregsfortytwo added a commit that referenced this pull request Mar 10, 2016
tools/cephfs: add tmap_upgrade

Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
@gregsfortytwo gregsfortytwo merged commit 9ae8486 into ceph:master Mar 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants