Qu-Wenruo/Btrf…
Commits on Mar 22, 2016
-
btrfs: dedupe: Fix a space cache delalloc bytes underflow bug
Dedupe has a bug that underflow block_group_cache->delalloc_bytes, makes it unable to return to 0. This will cause free space cache for that block group never written to disk. And cause the following kernel message at umount: BTRFS info (device vdc): The free space cache file (1485570048) is invalid. skip it Reported-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: relocation: Enhance error handling to avoid BUG_ON
Since the introduce of btrfs dedupe tree, it's possible that balance can race with dedupe disabling. When this happens, dedupe_enabled will make btrfs_get_fs_root() return PTR_ERR(-ENOENT). But due to a bug in error handling branch, when this happens backref_cache->nr_nodes is increased but the node is neither added to backref_cache or nr_nodes decreased. Causing BUG_ON() in backref_cache_cleanup() [ 2611.668810] ------------[ cut here ]------------ [ 2611.669946] kernel BUG at /home/sat/ktest/linux/fs/btrfs/relocation.c:243! [ 2611.670572] invalid opcode: 0000 [#1] SMP [ 2611.686797] Call Trace: [ 2611.687034] [<ffffffffa01f71d3>] btrfs_relocate_block_group+0x1b3/0x290 [btrfs] [ 2611.687706] [<ffffffffa01cc177>] btrfs_relocate_chunk.isra.40+0x47/0xd0 [btrfs] [ 2611.688385] [<ffffffffa01cdb12>] btrfs_balance+0xb22/0x11e0 [btrfs] [ 2611.688966] [<ffffffffa01d9611>] btrfs_ioctl_balance+0x391/0x3a0 [btrfs] [ 2611.689587] [<ffffffffa01ddaf0>] btrfs_ioctl+0x1650/0x2290 [btrfs] [ 2611.690145] [<ffffffff81171cda>] ? lru_cache_add+0x3a/0x80 [ 2611.690647] [<ffffffff81171e4c>] ? lru_cache_add_active_or_unevictable+0x4c/0xc0 [ 2611.691310] [<ffffffff81193f04>] ? handle_mm_fault+0xcd4/0x17f0 [ 2611.691842] [<ffffffff811da423>] ? cp_new_stat+0x153/0x180 [ 2611.692342] [<ffffffff8119913d>] ? __vma_link_rb+0xfd/0x110 [ 2611.692842] [<ffffffff81199209>] ? vma_link+0xb9/0xc0 [ 2611.693303] [<ffffffff811e7e81>] do_vfs_ioctl+0xa1/0x5a0 [ 2611.693781] [<ffffffff8104e024>] ? __do_page_fault+0x1b4/0x400 [ 2611.694310] [<ffffffff811e83c1>] SyS_ioctl+0x41/0x70 [ 2611.694758] [<ffffffff816dfc6e>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 2611.695331] Code: ff 48 8b 45 bf 49 83 af a8 05 00 00 01 49 89 87 a0 05 00 00 e9 2e fd ff ff b8 f4 ff ff ff e9 e4 fb ff ff 0f 0b 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 41 89 c6 e9 b8 fb ff ff e8 9e a6 e8 e0 4c 89 e7 44 [ 2611.697870] RIP [<ffffffffa01f6fc1>] relocate_block_group+0x741/0x7a0 [btrfs] [ 2611.698818] RSP <ffff88002a81fb30> This patch will call remove_backref_node() in error handling branch, and cache the returned -ENOENT in relocate_tree_block() and continue balancing. Reported-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Add support for compression and dedpue
The basic idea is also calculate hash before compression, and add needed members for dedupe to record compressed file extent. Since dedupe support dedupe_bs larger than 128K, which is the up limit of compression file extent, in that case we will skip dedupe and prefer compression, as in that size dedupe rate is low and compression will be more obvious. Current implement is far from elegant. The most elegant one should split every data processing method into its own and independent function, and have a unified function to co-operate them. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Preparation for compress-dedupe co-work
For dedupe to work with compression, new members recording compression algorithm and on-disk extent length are needed. Add them for later compress-dedupe co-work. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Avoid submit IO for hash hit extent
Before this patch, even for duplicated extent, it will still go through page write, meaning we didn't skip IO for them. Although such write will be skipped by block level, as block level will only select the last submitted write request to the same bytenr. This patch will manually skip such IO to reduce dedupe overhead. After this patch, dedupe all miss performance is higher than low compress ratio performance. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: Fix metadata balance error when dedupe is enabled
A missing branch in btrfs_get_fs_root() is making dedupe_root read from disk, and REF_COWS bit set. This makes btrfs balance treating dedupe_root as fs root, and reusing the old dedupe root bytenr to drop tree ref, causing the following kernel warning after metadata balancing: BTRFS error (device sdb6): unable to find ref byte nr 29736960 parent 0 root 11 owner 0 offset 0 ------------[ cut here ]------------ WARNING: CPU: 1 PID: 19113 at fs/btrfs/extent-tree.c:6636 __btrfs_free_extent.isra.66+0xb6d/0xd20 [btrfs]() BTRFS: Transaction aborted (error -2) Modules linked in: btrfs(O) xor zlib_deflate raid6_pq xfs [last unloaded: btrfs] CPU: 1 PID: 19113 Comm: btrfs Tainted: G W O 4.5.0-rc5+ #2 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 0000000000000000 ffff880035b0ba18 ffffffff813771ff ffff880035b0ba60 ffffffffa06a810a ffff880035b0ba50 ffffffff810bcb81 ffff88003c45c528 0000000001c5c000 00000000fffffffe ffff88003dc8c520 0000000000000000 Call Trace: [<ffffffff813771ff>] dump_stack+0x67/0x98 [<ffffffff810bcb81>] warn_slowpath_common+0x81/0xc0 [<ffffffff810bcc07>] warn_slowpath_fmt+0x47/0x50 [<ffffffffa06028fd>] __btrfs_free_extent.isra.66+0xb6d/0xd20 [btrfs] [<ffffffffa0606d4d>] __btrfs_run_delayed_refs.constprop.71+0x96d/0x1560 [btrfs] [<ffffffff81202ad9>] ? cmpxchg_double_slab.isra.68+0x149/0x160 [<ffffffff81106a1d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa060a5ce>] btrfs_run_delayed_refs+0x8e/0x2d0 [btrfs] [<ffffffffa06209fe>] btrfs_commit_transaction+0x3e/0xb50 [btrfs] [<ffffffffa069f26e>] ? btrfs_dedupe_disable+0x28e/0x2c0 [btrfs] [<ffffffff812035c3>] ? kfree+0x223/0x270 [<ffffffffa069f27a>] btrfs_dedupe_disable+0x29a/0x2c0 [btrfs] [<ffffffffa065e403>] btrfs_ioctl+0x2363/0x2a40 [btrfs] [<ffffffff8116b12a>] ? __audit_syscall_entry+0xaa/0xf0 [<ffffffff81137ce6>] ? current_kernel_time64+0x56/0xa0 [<ffffffff8122080e>] do_vfs_ioctl+0x8e/0x690 [<ffffffff8116b12a>] ? __audit_syscall_entry+0xaa/0xf0 [<ffffffff8122c181>] ? __fget_light+0x61/0x90 [<ffffffff81220e84>] SyS_ioctl+0x74/0x80 [<ffffffff8180ad57>] entry_SYSCALL_64_fastpath+0x12/0x6f ---[ end trace 618d5a5bc21d6a7c ]--- Fix it by adding corresponding branch for btrfs_get_fs_root(). Reported-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: Fix a memory leak in inband dedupe hash
We allocate a dedupe hash into async_extent, but forget to free it. Fix it by freeing the hash before freeing async_extent. Reported-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Fix a bug when running inband dedupe with balance
When running inband dedupe with balance, it's possible that inband dedupe still increase ref on extents which are in RO chunk. This may cause either find_data_references() gives warning, or make run_delayed_refs() return -EIO and cause trans abort. The cause is, normal dedupe_del() is only called at run_delayed_ref() time, which is too late for balance case. This patch fixes this bug by calling dedupe_del() at extent searching time of balance. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: try more times to alloc metadata reserve space
In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try to reserve is calculated by the difference between outstanding_extents and reserved_extents. When reserve_metadata_bytes() fails to reserve desited metadata space, it has already done some reclaim work, such as write ordered extents. In that case, outstanding_extents and reserved_extents may already changed, and we may reserve enough metadata space then. So this patch will try to call reserve_metadata_bytes() at most 3 times to ensure we really run out of space. Such false ENOSPC is mainly caused by small file extents and time consuming delalloc functions, which mainly affects in-band de-duplication. (Compress should also be affected, but LZO/zlib is faster than SHA256, so still harder to trigger than dedupe). Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: add per-file online dedupe control
Introduce inode_need_dedupe() to implement per-file online dedupe control. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: add a property handler for online dedupe
We use btrfs extended attribute "btrfs.dedupe" to record per-file online dedupe status, so add a dedupe property handler. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: add an inode nodedupe flag
Introduce BTRFS_INODE_NODEDUP flag, then we can explicitly disable online data dedupelication for specified files. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: Add ioctl for inband dedupelication
Add ioctl interface for inband dedupelication, which includes: 1) enable 2) disable 3) status We will later add ioctl to disable inband dedupe for given file/dir. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: Add support for adding hash for on-disk backend
Now on-disk backend can add hash now. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Add support to delete hash for on-disk backend
Now on-disk backend can delete hash now. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Add support for on-disk hash search
Now on-disk backend should be able to search hash now. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Introduce interfaces to resume and cleanup dedupe info
Since we will introduce a new on-disk based dedupe method, introduce new interfaces to resume previous dedupe setup. And since we introduce a new tree for status, also add disable handler for it. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Add basic tree structure for on-disk dedupe method
Introduce a new tree, dedupe tree to record on-disk dedupe hash. As a persist hash storage instead of in-memeory only implement. Unlike Liu Bo's implement, in this version we won't do hack for bytenr -> hash search, but add a new type, DEDUP_BYTENR_ITEM for such search case, just like in-memory backend. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Inband in-memory only de-duplication implement
Core implement for inband de-duplication. It reuse the async_cow_start() facility to do the calculate dedupe hash. And use dedupe hash to do inband de-duplication at extent level. The work flow is as below: 1) Run delalloc range for an inode 2) Calculate hash for the delalloc range at the unit of dedupe_bs 3) For hash match(duplicated) case, just increase source extent ref and insert file extent. For hash mismatch case, go through the normal cow_file_range() fallback, and add hash into dedupe_tree. Compress for hash miss case is not supported yet. Current implement restore all dedupe hash in memory rb-tree, with LRU behavior to control the limit. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: ordered-extent: Add support for dedupe
Add ordered-extent support for dedupe. Note, current ordered-extent support only supports non-compressed source extent. Support for compressed source extent will be added later. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: Implement btrfs_dedupe_calc_hash interface
Unlike in-memory or on-disk dedupe method, only SHA256 hash method is supported yet, so implement btrfs_dedupe_calc_hash() interface using SHA256. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: Introduce function to search for an existing hash
Introduce static function inmem_search() to handle the job for in-memory hash tree. The trick is, we must ensure the delayed ref head is not being run at the time we search the for the hash. With inmem_search(), we can implement the btrfs_dedupe_search() interface. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: delayed-ref: Add support for increasing data ref under spinlock
For in-band dedupe, btrfs needs to increase data ref with delayed_ref locked, so add a new function btrfs_add_delayed_data_ref_lock() to increase extent ref with delayed_refs already locked. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-
btrfs: dedupe: Introduce function to remove hash from in-memory tree
Introduce static function inmem_del() to remove hash from in-memory dedupe tree. And implement btrfs_dedupe_del() and btrfs_dedup_destroy() interfaces. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: Introduce function to add hash into in-memory tree
Introduce static function inmem_add() to add hash into in-memory tree. And now we can implement the btrfs_dedupe_add() interface. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: Introduce function to initialize dedupe info
Add generic function to initialize dedupe info. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
-
btrfs: dedupe: Introduce dedupe framework and its header
Introduce the header for btrfs online(write time) de-duplication framework and needed header. The new de-duplication framework is going to support 2 different dedupe methods and 1 dedupe hash. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Commits on Mar 14, 2016
-
btrfs: Fix misspellings in comments.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: Print Warning only if ENOSPC_DEBUG is enabled
Dont print warning for ENOSPC error unless ENOSPC_DEBUG is enabled. Use btrfs_debug if it is enabled. Signed-off-by: Ashish Samant <ashish.samant@oracle.com> [ preserve the WARN_ON ] Signed-off-by: David Sterba <dsterba@suse.com>
Commits on Mar 11, 2016
-
btrfs: scrub: silence an uninitialized variable warning
It's basically harmless if "ref_level" isn't initialized since it's only used for an error message, but it causes a static checker warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: move btrfs_compression_type to compression.h
So that its better organized. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: rename btrfs_print_info to btrfs_print_mod_info
So that it indicates what it does. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Btrfs: Show a warning message if one of objectid reaches its highest …
…value It's better to show a warning message for the exceptional case that one of objectid (in most case, inode number) reaches its highest value. For example, if inode cache is off and this event happens, we can't create any file even if there are not so many files. This message ease detecting such problem. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Documentation: btrfs: remove usage specific information
The document in the kernel sources is yet another palce where the documentation would need to be updated, while it is not the primary source. We actively maintain the wiki pages. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedMar 11, 2016 -
btrfs: use kbasename in btrfsic_mount
This is more readable. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>