tools/cephfs: add tmap_upgrade #7003

jcsp · 2015-12-21T14:28:01Z

Because TMAP support will go away in Jewel+1,
we need a way to ensure anyone upgrading
an ancient CephFS filesystem can make sure
all their TMAPs have gone away.

Signed-off-by: John Spray john.spray@redhat.com

jcsp · 2015-12-21T15:12:26Z

(testing this could be kind of interesting... dunno if any of the upgrade tests cover a version old enough to still use tmaps)

dzafman · 2015-12-21T20:43:46Z

LGTM Can we get a simple unit test that forces tmap to be used, then cleans up with "tmap_upgrade" ?

ukernel · 2015-12-29T13:18:36Z

src/tools/cephfs/DataScan.cc

+    }
+  }
+
+  return overall_r;


I guess overall_r can never be 0, because metadata pool does not store dirfrag object only. I think we should filter out non-dirfrag objects.

To keep it super-simple I've just changed it to ignore the EINVALs that we get on non-omap/tmap objects.

jcsp · 2016-01-04T13:33:54Z

I've added a test now. It's kind of awkward because the only way to create tmaps is using librados, so instead of being a normal ceph-qa-suite python test this is a C++ test in the style of the librados tests.

jcsp · 2016-01-08T14:29:05Z

test this please

dzafman · 2016-01-08T23:53:41Z

I built v0.74 and created a cluster with mds. I created a whole bunch of directories. After that I checked out hammer and built it. I just started the OSDs and didn't do anything with mds. Then I fetched your branch and built that. I started up the cluster and executed "./cephfs-data-scan tmap_upgrade metadata"

$ ./ceph mds stat
e16: 0/1/1 up, 1 up:standby, 1 damaged

I did note that a bunch of files in the filestore originally with 230 byte length (objects for empty directories with tmaps), are now empty which means the conversion from tmap to omap must have happened.

jcsp · 2016-01-10T15:04:00Z

@dzafman hmm, did you happen to keep the MDS logs? Would be useful to know what triggered the 'damaged'

gregsfortytwo · 2016-01-19T01:47:06Z

I guess we're blocked until @jcsp gets those logs?

dzafman · 2016-01-26T20:35:24Z

@jcsp Here the log output of the thread in mds that declared damage:

After executing: ./cephfs-data-scan tmap_upgrade metadata and starting up the mds. I actually started all daemons.

2016-01-26 12:28:39.914579 7f9bc9ffb700 10 MDSIOContextBase::complete: 12C_IO_MT_Load
2016-01-26 12:28:39.914584 7f9bc9ffb700 10 mds.0.inotable: load_2 got 34 bytes
2016-01-26 12:28:39.914586 7f9bc9ffb700 10 mds.0.inotable: load_2 loaded v4000
2016-01-26 12:28:39.914718 7f9bc9ffb700 10 MDSIOContextBase::complete: 12C_IO_SM_Load
2016-01-26 12:28:39.914725 7f9bc9ffb700  4 mds.0.sessionmap _load_finish: header missing, loading legacy...
2016-01-26 12:28:39.914728 7f9bc9ffb700 10 mds.0.sessionmap load_legacy
2016-01-26 12:28:39.914748 7f9bc9ffb700  1 -- 127.0.0.1:6812/29414 --> 127.0.0.1:6804/29393 -- osd_op(mds.0.2:5 1.3270c60b mds0_sessionmap [read 0~0] snapc 0=[] ack+read+known_if_redirected+full_force e25) v7 -- ?+0 0x7f9ba80013b0 con 0x7f9bd00252f0
2016-01-26 12:28:39.915361 7f9bc9ffb700 10 MDSIOContextBase::complete: 12C_IO_MT_Load
2016-01-26 12:28:39.915366 7f9bc9ffb700 10 mds.0.snaptable: load_2 got 0 bytes
2016-01-26 12:28:39.916248 7f9bc9ffb700 -1 log_channel(cluster) log [ERR] : error decoding table object 'mds_snaptable': buffer::end_of_buffer
2016-01-26 12:28:39.916257 7f9bc9ffb700 10 mds.beacon.a set_want_state: up:replay -> down:damaged
2016-01-26 12:28:39.916259 7f9bc9ffb700 10 log_client  log_queue is 1 last_log 1 sent 0 num 1 unsent 1 sending 1
2016-01-26 12:28:39.916264 7f9bc9ffb700 10 log_client  will send 2016-01-26 12:28:39.916255 mds.0 127.0.0.1:6812/29414 1 : cluster [ERR] error decoding table object 'mds_snaptable': buffer::end_of_buffer
2016-01-26 12:28:39.916283 7f9bc9ffb700 10 monclient: _send_mon_message to mon.a at 127.0.0.1:6789/0
2016-01-26 12:28:39.916285 7f9bc9ffb700  1 -- 127.0.0.1:6812/29414 --> 127.0.0.1:6789/0 -- log(1 entries from seq 1 at 2016-01-26 12:28:39.916255) v1 -- ?+0 0x7f9ba8001d30 con 0x7f9be924fc20
2016-01-26 12:28:39.916318 7f9bc9ffb700 10 mds.beacon.a _send down:damaged seq 2
2016-01-26 12:28:39.916324 7f9bc9ffb700 10 monclient: _send_mon_message to mon.a at 127.0.0.1:6789/0
2016-01-26 12:28:39.916326 7f9bc9ffb700  1 -- 127.0.0.1:6812/29414 --> 127.0.0.1:6789/0 -- mdsbeacon(24197/a down:damaged seq 2 v9) v4 -- ?+0 0x7f9ba8002240 con 0x7f9be924fc20
2016-01-26 12:28:39.916329 7f9bc9ffb700 20 mds.beacon.a send_and_wait: awaiting 2 for up to 5s

jcsp · 2016-01-28T13:04:30Z

OK, I think I was overestimating the safety of running TMAP2OMAP on objects. The TMAP loading code will accept anything that looks vaguely decode-able as a couple of bufferlists, and if it sees something that it thinks it could load, it will truncate the body of the object.

jcsp · 2016-01-29T10:21:42Z

Updated: should work properly this time!

dzafman · 2016-01-29T23:49:30Z

@jcsp After the tmap upgrade the MDS crashes with an assert. I've attached a log in tracker: http://tracker.ceph.com/issues/13768 .

Also, after the upgrade I found that all but 1 of objects with 230 byte length which I believe to be the tmaps are zero length as they should be except for inode 2. Looks like the code missed the root directory.

dzafman$ find dev -type f -ls| grep " 230 "
846221 8 -rw-r--r-- 1 dzafman dzafman 230 Jan 29 12:52 dev/osd1/current/1.7_head/DIR_7/2.00000000__head_96F33707__1
852106 8 -rw-r--r-- 1 dzafman dzafman 230 Jan 29 12:52 dev/osd2/current/1.7_head/DIR_7/2.00000000__head_96F33707__1
846220 8 -rw-r--r-- 1 dzafman dzafman 230 Jan 29 12:52 dev/osd0/current/1.7_head/DIR_7/2.00000000__head_96F33707__1

jcsp · 2016-02-01T10:28:06Z

Ah, it's MDS_INO_CEPH (i.e. /.ceph), I've added that now, thanks for spotting it.

However, I don't see a connection between that and the assertion

osdc/Journaler.cc: 431: FAILED assert(last_written.write_pos >= last_written.expire_pos)

I'm suspicious that that could be from some other unrelated upgrade bug.

@dzafman do you by any chance have a "rados export" of the metadata pool from before tmap_upgrade was run so that I can debug this?

dzafman · 2016-02-01T21:22:09Z

@jcsp No, I don't have a rados export, but I will generate one. I'm going to upgrade to Jewel and start the MDS to see if it crashes, if it does not, I'll create exports and then switch to your branch and try the tmap_upgrade.

dzafman · 2016-02-02T06:23:53Z

@jcsp Manual testing passed

After Dumpling build (create a bunch of directories) -> firefly -> Hammer -> wip-cephfs-tmap-migrate. I ran "./cephfs-data-scan tmap_upgrade metadata" appeared to work. I stop the MDS and added an assert in do_osd_ops() for CEPH_OSD_OP_TMAPGET. It was never called when traversing the entire directory tree.

jcsp · 2016-02-02T11:20:56Z

@dzafman great, thanks for your persistence.

gregsfortytwo · 2016-02-02T17:49:32Z

@jcsp, @dzafman Presumably we'll need some upstream docs for users on how to do the upgrade?
@ukernel, can I take from your comment that you reviewed this?

dzafman · 2016-02-02T20:31:03Z

@gregsfortytwo @jcsp Is there a reason we don't do the tmap upgrade automatically since we have to restarting the MDS anyway?

gregsfortytwo · 2016-02-02T20:35:29Z

On Tue, Feb 2, 2016 at 12:31 PM, David Zafman notifications@github.com wrote:

@gregsfortytwo @jcsp Is there a reason we don't do the tmap upgrade automatically since we have to restarting the MDS anyway?

The MDS already switches anything it sees and changes over to omap.
The tool (I presume, I didn't actually look) has to crawl the whole
pool and upgrade everything.

jcsp · 2016-02-02T20:54:25Z

Right: the tool is so that the MDS doesn't have to traverse the entire metadata tree and do tmap upgrades. We could have built it into forward scrub and done it in the background, but this was way simpler.

dzafman · 2016-02-02T20:55:11Z

@gregsfortytwo We can't remove tmap in jewel+1 unless we require an upgrade to jewel (and running the tool) before upgrade to jewel+1.

Will we we want to allow an upgrade from Hammer directly to Jewel+1?

liewegas · 2016-02-02T21:03:56Z

I think we should require upgraders stop at jewel, as we did with hammer. We'll need to document loudly that the tool is required for any clusters created before.. whenever it was that we stopped using tmap.

gregsfortytwo · 2016-02-02T21:08:41Z

I think we should require upgraders stop at jewel, as we did with hammer. We'll need to document loudly that the tool is required for any clusters created before.. whenever it was that we stopped using tmap.

Should probably backport it to Hammer (I assume that's feasible) once we're done; then they won't need the jewel stop. Realistically I would be surprised if more than a couple people need this anyway, unless there are a lot of home users with rsync archives they haven't touched in years...

…into greg-fs-testing #7003

jcsp · 2016-02-09T14:56:39Z

Assigned to self to make doc updates

Because TMAP support will go away in Jewel+1, we need a way to ensure anyone upgrading an ancient CephFS filesystem can make sure all their TMAPs have gone away. Signed-off-by: John Spray <john.spray@redhat.com>

jcsp · 2016-02-29T14:25:26Z

Added docs, this is good to go. I have added a recurring note to my calendar to send out an email and make sure the release notes mention it at the point we're releasing Jewel.

gregsfortytwo · 2016-02-29T17:18:29Z

I think you can just update the PendingReleaseNotes file?

This is part of the run-up to removing all TMAP code in the Jewel+1 cycle. Signed-off-by: John Spray <john.spray@redhat.com>

jcsp · 2016-03-01T10:59:28Z

@gregsfortytwo good point, forgot about that file. Udpated.

…into greg-fs-testing #7003 Reviewed-by: David Zafman <dzafman@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Greg Farnum <gfarnum@redhat.com>

gregsfortytwo · 2016-03-10T06:22:10Z

http://qa-proxy.ceph.com/teuthology/gregf-2016-03-07_23:12:27-fs-greg-fs-testing-3-7-safe---basic-mira/
5 failures, largely infrastructure issues and all known and consistent around this time period.

tools/cephfs: add tmap_upgrade Reviewed-by: David Zafman <dzafman@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Greg Farnum <gfarnum@redhat.com>

jcsp added cephfs Ceph File System feature labels Dec 21, 2015

ukernel reviewed Dec 29, 2015
View reviewed changes

jcsp force-pushed the wip-cephfs-tmap-migrate branch 2 times, most recently from 15f09f0 to fecbc77 Compare January 4, 2016 13:33

gregsfortytwo assigned dzafman Jan 19, 2016

jcsp force-pushed the wip-cephfs-tmap-migrate branch from 1d4a668 to 43b4271 Compare January 29, 2016 10:19

jcsp force-pushed the wip-cephfs-tmap-migrate branch from 43b4271 to fefe7c0 Compare February 1, 2016 10:21

jcsp assigned gregsfortytwo and unassigned dzafman Feb 2, 2016

gregsfortytwo assigned dzafman and unassigned gregsfortytwo Feb 2, 2016

gregsfortytwo added needs-qa wip-greg-testing labels Feb 2, 2016

gregsfortytwo added a commit that referenced this pull request Feb 3, 2016

Merge branch 'wip-cephfs-tmap-migrate' of git://github.com/jcsp/ceph …

37acbbe

…into greg-fs-testing #7003

gregsfortytwo added a commit that referenced this pull request Feb 8, 2016

Merge branch 'wip-cephfs-tmap-migrate' of git://github.com/jcsp/ceph …

836deb1

…into greg-fs-testing #7003

jcsp assigned jcsp and unassigned dzafman Feb 9, 2016

gregsfortytwo removed needs-qa wip-greg-testing labels Feb 12, 2016

tools/cephfs: add tmap_upgrade

0952f35

Because TMAP support will go away in Jewel+1, we need a way to ensure anyone upgrading an ancient CephFS filesystem can make sure all their TMAPs have gone away. Signed-off-by: John Spray <john.spray@redhat.com>

jcsp force-pushed the wip-cephfs-tmap-migrate branch from fefe7c0 to b322873 Compare February 29, 2016 14:20

jcsp assigned gregsfortytwo and unassigned jcsp Feb 29, 2016

doc: add notes about upgrading cephfs

e564111

This is part of the run-up to removing all TMAP code in the Jewel+1 cycle. Signed-off-by: John Spray <john.spray@redhat.com>

jcsp force-pushed the wip-cephfs-tmap-migrate branch from b322873 to e564111 Compare March 1, 2016 10:58

gregsfortytwo added the needs-qa label Mar 7, 2016

gregsfortytwo added the wip-greg-testing label Mar 8, 2016

gregsfortytwo merged commit 9ae8486 into ceph:master Mar 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools/cephfs: add tmap_upgrade #7003

tools/cephfs: add tmap_upgrade #7003

jcsp commented Dec 21, 2015

jcsp commented Dec 21, 2015

dzafman commented Dec 21, 2015

ukernel Dec 29, 2015

jcsp Jan 4, 2016

jcsp commented Jan 4, 2016

jcsp commented Jan 8, 2016

dzafman commented Jan 8, 2016

jcsp commented Jan 10, 2016

gregsfortytwo commented Jan 19, 2016

dzafman commented Jan 26, 2016

jcsp commented Jan 28, 2016

jcsp commented Jan 29, 2016

dzafman commented Jan 29, 2016

jcsp commented Feb 1, 2016

dzafman commented Feb 1, 2016

dzafman commented Feb 2, 2016

jcsp commented Feb 2, 2016

gregsfortytwo commented Feb 2, 2016

dzafman commented Feb 2, 2016

gregsfortytwo commented Feb 2, 2016

jcsp commented Feb 2, 2016

dzafman commented Feb 2, 2016

liewegas commented Feb 2, 2016 via email

gregsfortytwo commented Feb 2, 2016 via email

jcsp commented Feb 9, 2016

jcsp commented Feb 29, 2016

gregsfortytwo commented Feb 29, 2016

jcsp commented Mar 1, 2016

gregsfortytwo commented Mar 10, 2016

tools/cephfs: add tmap_upgrade #7003

tools/cephfs: add tmap_upgrade #7003

Conversation

jcsp commented Dec 21, 2015

jcsp commented Dec 21, 2015

dzafman commented Dec 21, 2015

ukernel Dec 29, 2015

Choose a reason for hiding this comment

jcsp Jan 4, 2016

Choose a reason for hiding this comment

jcsp commented Jan 4, 2016

jcsp commented Jan 8, 2016

dzafman commented Jan 8, 2016

jcsp commented Jan 10, 2016

gregsfortytwo commented Jan 19, 2016

dzafman commented Jan 26, 2016

jcsp commented Jan 28, 2016

jcsp commented Jan 29, 2016

dzafman commented Jan 29, 2016

jcsp commented Feb 1, 2016

dzafman commented Feb 1, 2016

dzafman commented Feb 2, 2016

jcsp commented Feb 2, 2016

gregsfortytwo commented Feb 2, 2016

dzafman commented Feb 2, 2016

gregsfortytwo commented Feb 2, 2016

jcsp commented Feb 2, 2016

dzafman commented Feb 2, 2016

liewegas commented Feb 2, 2016 via email

gregsfortytwo commented Feb 2, 2016 via email

jcsp commented Feb 9, 2016

jcsp commented Feb 29, 2016

gregsfortytwo commented Feb 29, 2016

jcsp commented Mar 1, 2016

gregsfortytwo commented Mar 10, 2016