New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bcache and vdo not work #9
Comments
@corwin do you have time to solve this issue? |
Please describe in a little more detail what you were doing to trigger the error. Also, looking a little bit at the bcache code in the function that reports that error, I see some possible issues with size handling. Is the error dependent on the logical size of the VDO device? Does it work for, say, 15GB logical? (Or if you put an LVM logical volume of 15GB on top of VDO, and put bcache on top of that?) |
@raeburn now i tried to reproduce, but all working. In VM i created two devices, 10G for cache and 50G for vdo. Run 2 test fio with option --dedupe_percentage=80 and vdostats --si show me 80% saving space Issue can be closed. |
I figured the sizes wrong in my previous message. I think 8TB and 16TB are interesting boundary sizes to look at. One of my co-workers is taking a look right now. |
I was able to recreate this on Fedora 28, running kernel version 4.17.7-200.fc28.x86_64, and kvdo module version 6.2.0.132. If I use a 33 GB partition as a cache device, and a VDO volume with a logical size of just over 8 TB ("--vdoLogicalSize=8388609M"), I can see the same error: kernel: bcache: bcache_device_init() nr_stripes too large or invalid: 2147483902 (start sector beyond end of disk?) Interestingly, I can also see the same error for a VDO volume of 9 TB: vdo create --name=vdo1 --device=/dev/sdd1 --vdoLogicalSize=9T ...and for a VDO volume of 25 TB: vdo create --name=vdo1 --device=/dev/sdd1 --vdoLogicalSize=25T (Note that the "nr_stripes" value is identical between the 9 TB VDO volume and the 25 TB VDO volume, which suggests a rollover.) |
Thanks, Bryan! That confirms my suspicions. The issues with the bcache initialization (bcache_device_init, drivers/md/bcache/super.c) and VDO that I noticed, in the version I looked at: A "stripe" is chosen to be the optimal I/O size for the underlying device (cached_dev_init). For RAID 5/6 devices, this makes a lot of sense; stripe size will be tens or hundreds of kB or more, and writing a stripe all at once will be more efficient. For VDO, there's no such grouping happening under the covers, but partial or misaligned blocks carry a penalty because they require read-modify-write cycles, so our optimal I/O size is our block size, 4kB. The bcache initialization code computes the number of stripes (device size divided by stripe size), and caps it at INT_MAX on a 64-bit system (or less, for a 32-bit system). This means 2**31 stripes, or 8TB at a 4kB block size, would exceed the limit. Also, the quotient from the calculation is stuffed into a 32-bit (type "unsigned") field, and read back out for comparison against the maximum. So if the quotient exceeds 32 bits (16TB, at a 4kB block size), the computed nr_stripes (and thus the sizes of the stripe_sectors_dirty and full_dirty_stripes) will be wrong. I'm suspicious about out-of-bounds references in such cases, but haven't dug into this. (Bryan mentioned to me that creating a bcache device atop VDO worked if the VDO logical size was 18TB or 34TB, but failed in the cases above. So it seems to be a question of whether the VDO logical size, mod 16TB, is greater or less than 8TB; less, initialization works, greater, it fails. That matches up with my expectations.) I suppose one might argue whether VDO should declare an optimal block size at all, but it seems pretty clear that bcache has a problem with large devices that have small optimal I/O sizes. Bryan and I were speculating that an MD device or other such device with a lot of storage and a tiny chunk size might replicate this problem without VDO in the mix, which might be a little more persuasive that it's not a VDO problem. Perhaps some other caching driver like dm-writecache might work better... |
I still have everything working =) sdb - 30Gb --> for cache device #Create VDO device 50Tb #Disable bcache sequential cutoff and threshold
#mkfs ext4 #Mount bcache device
#Create random 10G file #DD to /mnt 10 copy of file io #Before DD
#After DD
|
I should clarify the comments from yesterday. From raeburn: "So it seems to be a question of whether the VDO logical size, [modulo] 16TB, is greater or less than 8TB; less, initialization works, greater, it fails." In other words, if you create a bcache volume on top of a VDO volume with a logical size of approximately... ...and so on, and so on. The failure appears to happen when "nr_stripes" is between 2147483648 and 4294967295. bcache seems to interpret the VDO volume as a device with "stripes" of 4096 bytes, and it calculates the number of stripes in the device. ("nr_stripes") An 8 TB "device" with a "stripe size" of 4096 bytes will have 2147483648 "stripes", which will overflow a 32-bit signed counter. The counter will reset to zero at sizes divisible by 16 TB, and will result in the success/failure pattern I detailed above. |
@bgurney-rh Many thanks for the clarification, I rechecked and tested, everything works exactly as you commented. |
When i create bcache over vdo device and vdo logical size bigger then original drive, i got this message:
This is a bug vdo or bcache?
The text was updated successfully, but these errors were encountered: