Qu-Wenruo/btrf…
Commits on Jan 14, 2016
-
btrfs: dedup: add per-file online dedup control
Introduce inode_need_dedup() to implement per-file online dedup control. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedup: add a property handler for online dedup
We use btrfs extended attribute "btrfs.dedup" to record per-file online dedup status, so add a dedup property handler. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedup: add an inode nodedup flag
Introduce BTRFS_INODE_NODEDUP flag, then we can explicitly disable online data deduplication for specified files. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedup: Add ioctl for inband deduplication
Add ioctl interface for inband deduplication, which includes: 1) enable 2) disable 3) status We will later add ioctl to disable inband dedup for given file/dir. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedup: Add support for adding hash for on-disk backend
Now on-disk backend can add hash now. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedup: Add support to delete hash for on-disk backend
Now on-disk backend can delete hash now. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedup: Add support for on-disk hash search
Now on-disk backend should be able to search hash now. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedup: Introduce interfaces to resume and cleanup dedup info
Since we will introduce a new on-disk based dedup method, introduce new interfaces to resume previous dedup setup. And since we introduce a new tree for status, also add disable handler for it. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedup: Add basic tree structure for on-disk dedup method
Introduce a new tree, dedup tree to record on-disk dedup hash. As a persist hash storage instead of in-memeory only implement. Unlike Liu Bo's implement, in this version we won't do hack for bytenr -> hash search, but add a new type, DEDUP_BYTENR_ITEM for such search case, just like in-memory backend. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedup: Inband in-memory only de-duplication implement
Core implement for inband de-duplication. It reuse the async_cow_start() facility to do the calculate dedup hash. And use dedup hash to do inband de-duplication at extent level. The work flow is as below: 1) Run delalloc range for an inode 2) Calculate hash for the delalloc range at the unit of dedup_bs 3) For hash match(duplicated) case, just increase source extent ref and insert file extent. For hash mismatch case, go through the normal cow_file_range() fallback, and add hash into dedup_tree. Compress for hash miss case is not supported yet. Current implement restore all dedup hash in memory rb-tree, with LRU behavior to control the limit. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: ordered-extent: Add support for dedup
Add ordered-extent support for dedup. Note, current ordered-extent support only supports non-compressed source extent. Support for compressed source extent will be added later. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedup: Implement btrfs_dedup_calc_hash interface
Unlike in-memory or on-disk dedup method, only SHA256 hash method is supported yet, so implement btrfs_dedup_calc_hash() interface using SHA256. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedup: Introduce function to search for an existing hash
Introduce static function inmem_search() to handle the job for in-memory hash tree. The trick is, we must ensure the delayed ref head is not being run at the time we search the for the hash. With inmem_search(), we can implement the btrfs_dedup_search() interface. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: delayed-ref: Add support for atomic increasing extent ref
Slightly modify btrfs_add_delayed_data_ref() to allow it accept GFP_ATOMIC, and allow it to do be called inside a spinlock. This is used by later dedup patches. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedup: Introduce function to remove hash from in-memory tree
Introduce static function inmem_del() to remove hash from in-memory dedup tree. And implement btrfs_dedup_del() and btrfs_dedup_destroy() interfaces. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedup: Introduce function to add hash into in-memory tree
Introduce static function inmem_add() to add hash into in-memory tree. And now we can implement the btrfs_dedup_add() interface. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedup: Introduce function to initialize dedup info
Add generic function to initialize dedup info. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedup: Introduce dedup framework and its header
Introduce the header for btrfs online(write time) de-duplication framework and needed header. The new de-duplication framework is going to support 2 different dedup method and 1 dedup hash. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Commits on Jan 11, 2016
-
Merge branch 'for-chris-4.5' of git://git.kernel.org/pub/scm/linux/ke…
…rnel/git/fdmanana/linux into for-linus-4.5 Signed-off-by: Chris Mason <clm@fb.com>
masoncl committedJan 11, 2016 -
Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linu…
…x/kernel/git/kdave/linux into for-linus-4.5 Signed-off-by: Chris Mason <clm@fb.com>
masoncl committedJan 11, 2016 -
Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/ker…
…nel/git/kdave/linux into for-linus-4.5
masoncl committedJan 11, 2016
Commits on Jan 7, 2016
-
Btrfs: fix fitrim discarding device area reserved for boot loader's use
As of the 4.3 kernel release, the fitrim ioctl can now discard any region of a disk that is not allocated to any chunk/block group, including the first megabyte which is used for our primary superblock and by the boot loader (grub for example). Fix this by not allowing to trim/discard any region in the device starting with an offset not greater than min(alloc_start_mount_option, 1Mb), just as it was not possible before 4.3. A reproducer test case for xfstests follows. seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { cd / rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 # Write to the [0, 64Kb[ and [68Kb, 1Mb[ ranges of the device. These ranges are # reserved for a boot loader to use (GRUB for example) and btrfs should never # use them - neither for allocating metadata/data nor should trim/discard them. # The range [64Kb, 68Kb[ is used for the primary superblock of the filesystem. $XFS_IO_PROG -c "pwrite -S 0xfd 0 64K" $SCRATCH_DEV | _filter_xfs_io $XFS_IO_PROG -c "pwrite -S 0xfd 68K 956K" $SCRATCH_DEV | _filter_xfs_io # Now mount the filesystem and perform a fitrim against it. _scratch_mount _require_batched_discard $SCRATCH_MNT $FSTRIM_PROG $SCRATCH_MNT # Now unmount the filesystem and verify the content of the ranges was not # modified (no trim/discard happened on them). _scratch_unmount echo "Content of the ranges [0, 64Kb] and [68Kb, 1Mb[ after fitrim:" od -t x1 -N $((64 * 1024)) $SCRATCH_DEV od -t x1 -j $((68 * 1024)) -N $((956 * 1024)) $SCRATCH_DEV status=0 exit Reported-by: Vincent Petry <PVince81@yahoo.fr> Reported-by: Andrei Borzenkov <arvidjaar@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109341 Fixes: 499f377 (btrfs: iterate over unused chunk space in FITRIM) Cc: stable@vger.kernel.org # 4.3+ Signed-off-by: Filipe Manana <fdmanana@suse.com>fdmanana committedJan 7, 2016 -
Btrfs: Check metadata redundancy on balance
When converting a filesystem via balance check that metadata mode is at least as redundant as the data mode. For example give warning when: -dconvert=raid1 -mconvert=single Signed-off-by: Sam Tygier <samtygier@yahoo.co.uk> [ minor message reformatting ] Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: statfs: report zero available if metadata are exhausted
There is one ENOSPC case that's very confusing. There's Available greater than zero but no file operation succeds (besides removing files). This happens when the metadata are exhausted and there's no possibility to allocate another chunk. In this scenario it's normal that there's still some space in the data chunk and the calculation in df reflects that in the Avail value. To at least give some clue about the ENOSPC situation, let statfs report zero value in Avail, even if there's still data space available. Current: /dev/sdb1 4.0G 3.3G 719M 83% /mnt/test New: /dev/sdb1 4.0G 3.3G 0 100% /mnt/test We calculate the remaining metadata space minus global reserve. If this is (supposedly) smaller than zero, there's no space. But this does not hold in practice, the exhausted state happens where's still some positive delta. So we apply some guesswork and compare the delta to a 4M threshold. (Practically observed delta was 2M.) We probably cannot calculate the exact threshold value because this depends on the internal reservations requested by various operations, so some operations that consume a few metadata will succeed even if the Avail is zero. But this is better than the other way around. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs: preallocate path for snapshot creation at ioctl time
We can also preallocate btrfs_path that's used during pending snapshot creation and avoid another late ENOMEM failure. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs: allocate root item at snapshot ioctl time
The actual snapshot creation is delayed until transaction commit. If we cannot get enough memory for the root item there, we have to fail the whole transaction commit which is bad. So we'll allocate the memory at the ioctl call and pass it along with the pending_snapshot struct. The potential ENOMEM will be returned to the caller of snapshot ioctl. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs: do an allocation earlier during snapshot creation
We can allocate pending_snapshot earlier and do not have to do cleanup in case of failure. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs: use smaller type for btrfs_path locks
The values of btrfs_path::locks are 0 to 4, fit into a u8. Let's see: * overall size of btrfs_path drops down from 136 to 112 (-24 bytes), * better packing in a slab page +6 objects * the whole structure now fits to 2 cachelines * slight decrease in code size: text data bss dec hex filename 938731 43670 23144 1005545 f57e9 fs/btrfs/btrfs.ko.before 938203 43670 23144 1005017 f55d9 fs/btrfs/btrfs.ko.after (and the generated assembly does not change much) The main purpose is to decrease the size of the structure without affecting performance. The byte access is usually well behaving accross arches, the locks are not accessed frequently and sometimes just compared to zero. Note for further size reduction attempts: the slots could be made u16 but this might generate worse code on some arches (non-byte and non-int access). Also the range of operations on slots is wider compared to locks and the potential performance drop should be evaluated first. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs: use smaller type for btrfs_path lowest_level
The level is 0..7, we can use smaller type. The size of btrfs_path is now 136 bytes from 144, which is +2 objects that fit into a 4k slab. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs: use smaller type for btrfs_path reada
The possible values for reada are all positive and bounded, we can later save some bytes by storing it in u8. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs: cleanup, use enum values for btrfs_path reada
Replace the integers by enums for better readability. The value 2 does not have any meaning since a717531 "Btrfs: do less aggressive btree readahead" (2009-01-22). Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
There are a few statically initialized arrays that can be made const. The remaining (like file_system_type, sysfs attributes or prop handlers) do not allow that due to type mismatch when passed to the APIs or because the structures are modified through other members. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs: constify remaining structs with function pointers
* struct extent_io_ops * struct btrfs_free_space_op Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs tests: replace whole ops structure for free space tests
Preparatory work for making btrfs_free_space_op constant. In test_steal_space_from_bitmap_to_extent, we substitute use_bitmap with own version thus preventing constification. We can rework it so we replace the whole structure with the correct function pointers. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedJan 7, 2016 -
btrfs: use list_for_each_entry* in backref.c
Use list_for_each_entry*() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>