Skip to content

Commit

Permalink
Structural cleanup for filesystem-based swap
Browse files Browse the repository at this point in the history
Linux primarily uses IO to block devices for swap, but can send the IO
requests to a filesystem.  This has only ever worked for NFS, and that
hasn't worked for a while due to a lack of testing.  This seems like a
good time for some tidy-up before restoring swap-over-NFS functionality.

This patch:
 - updates the documentation (both copies!) for swap_activate which
   is woefully out-of-date
 - introduces a new address_space operation "swap_rw" for swap IO.
   The code currently used ->readpage for reads and ->direct_IO for
   writes.  The former imposes a limit of one-page-at-a-time, the
   later means that direct writes and swap writes are encouraged to
   use the same path.  While similar, swap can often be simpler as
   it can assume that no allocation is needed, and coherence with the
   page cache is irrelevant.
 - move the responsibility for setting SWP_FS_OPS to ->swap_activate()
   and also requires it to always call add_swap_extent().  This makes
   it much easier to find filesystems that require SWP_FS_OPS.
 - drops the call to the filesystem for ->set_page_dirty().  These
   pages do not belong to the filesystem, and it has no interest
   in the dirty status.

writeout is switched to ->swap_rw, but read-in is not as that requires
too much change for this patch.

Both cifs and nfs set SWP_FS_OPS but neither provide a swap_rw, so both
will now fail to activate swap.  cifs never really tried to provide swap
support as ->direct_IO always returns an error.  NFS will be fixed up
with following patches.

Signed-off-by: NeilBrown <neilb@suse.de>
  • Loading branch information
neilbrown authored and intel-lab-lkp committed Dec 16, 2021
1 parent 68f87ec commit 6443c9d
Show file tree
Hide file tree
Showing 9 changed files with 56 additions and 43 deletions.
18 changes: 12 additions & 6 deletions Documentation/filesystems/locking.rst
Expand Up @@ -265,8 +265,9 @@ prototypes::
int (*launder_page)(struct page *);
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
int (*error_remove_page)(struct address_space *, struct page *);
int (*swap_activate)(struct file *);
int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
int (*swap_deactivate)(struct file *);
int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);

locking rules:
All except set_page_dirty and freepage may block
Expand Down Expand Up @@ -295,6 +296,7 @@ is_partially_uptodate: yes
error_remove_page: yes
swap_activate: no
swap_deactivate: no
swap_rw: yes, unlocks
====================== ======================== ========= ===============

->write_begin(), ->write_end() and ->readpage() may be called from
Expand Down Expand Up @@ -397,15 +399,19 @@ cleaned, or an error value if not. Note that in order to prevent the page
getting mapped back in and redirtied, it needs to be kept locked
across the entire operation.

->swap_activate will be called with a non-zero argument on
files backing (non block device backed) swapfiles. A return value
of zero indicates success, in which case this file can be used for
backing swapspace. The swapspace operations will be proxied to the
address space operations.
->swap_activate() will be called to prepare the given file for swap. It
should perform any validation and preparation necessary to ensure that
writes can be performed with minimal memory allocation. It should call
add_swap_extent(), or the helper iomap_swapfile_activate(), and return
the number of extents added. If IO should be submitted through
->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted
directly to the block device ``sis->bdev``.

->swap_deactivate() will be called in the sys_swapoff()
path after ->swap_activate() returned success.

->swap_rw will be called for swap IO if ->swap_activate() set SWP_FS_OPS.

file_lock_operations
====================

Expand Down
17 changes: 12 additions & 5 deletions Documentation/filesystems/vfs.rst
Expand Up @@ -751,8 +751,9 @@ cache in your filesystem. The following members are defined:
unsigned long);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
int (*error_remove_page) (struct mapping *mapping, struct page *page);
int (*swap_activate)(struct file *);
int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
int (*swap_deactivate)(struct file *);
int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
};
``writepage``
Expand Down Expand Up @@ -959,15 +960,21 @@ cache in your filesystem. The following members are defined:
unless you have them locked or reference counts increased.

``swap_activate``
Called when swapon is used on a file to allocate space if
necessary and pin the block lookup information in memory. A
return value of zero indicates success, in which case this file
can be used to back swapspace.

Called to prepare the given file for swap. It should perform
any validation and preparation necessary to ensure that writes
can be performed with minimal memory allocation. It should call
add_swap_extent(), or the helper iomap_swapfile_activate(), and
return the number of extents added. If IO should be submitted
through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will
be submitted directly to the block device ``sis->bdev``.

``swap_deactivate``
Called during swapoff on files where swap_activate was
successful.

``swap_rw``
Called to read or write swap pages when swap_activate() set SWP_FS_OPS.

The File Object
===============
Expand Down
7 changes: 6 additions & 1 deletion fs/cifs/file.c
Expand Up @@ -4943,6 +4943,10 @@ static int cifs_swap_activate(struct swap_info_struct *sis,

cifs_dbg(FYI, "swap activate\n");

if (!swap_file->f_mapping->a_ops->swap_rw)
/* Cannot support swap */
return -EINVAL;

spin_lock(&inode->i_lock);
blocks = inode->i_blocks;
isize = inode->i_size;
Expand Down Expand Up @@ -4971,7 +4975,8 @@ static int cifs_swap_activate(struct swap_info_struct *sis,
* from reading or writing the file
*/

return 0;
sis->flags |= SWP_FS_OPS;
return add_swap_extent(sis, 0, sis->max, 0);
}

static void cifs_swap_deactivate(struct file *file)
Expand Down
17 changes: 15 additions & 2 deletions fs/nfs/file.c
Expand Up @@ -489,9 +489,14 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
{
unsigned long blocks;
long long isize;
int ret;
struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host);
struct inode *inode = file->f_mapping->host;

if (!file->f_mapping->a_ops->swap_rw)
/* Cannot support swap */
return -EINVAL;

spin_lock(&inode->i_lock);
blocks = inode->i_blocks;
isize = inode->i_size;
Expand All @@ -501,9 +506,17 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
return -EINVAL;
}

ret = rpc_clnt_swap_activate(clnt);
if (ret)
return ret;
ret = add_swap_extent(sis, 0, sis->max, 0);
if (ret < 0) {
rpc_clnt_swap_deactivate(clnt);
return ret;
}
*span = sis->pages;

return rpc_clnt_swap_activate(clnt);
sis->flags |= SWP_FS_OPS;
return ret;
}

static void nfs_swap_deactivate(struct file *file)
Expand Down
1 change: 1 addition & 0 deletions include/linux/fs.h
Expand Up @@ -415,6 +415,7 @@ struct address_space_operations {
int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
sector_t *span);
void (*swap_deactivate)(struct file *file);
int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
};

extern const struct address_space_operations empty_aops;
Expand Down
1 change: 0 additions & 1 deletion include/linux/swap.h
Expand Up @@ -427,7 +427,6 @@ extern int swap_writepage(struct page *page, struct writeback_control *wbc);
extern void end_swap_bio_write(struct bio *bio);
extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
bio_end_io_t end_write_func);
extern int swap_set_page_dirty(struct page *page);

int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
unsigned long nr_pages, sector_t start_block);
Expand Down
26 changes: 6 additions & 20 deletions mm/page_io.c
Expand Up @@ -307,10 +307,9 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,

set_page_writeback(page);
unlock_page(page);
ret = mapping->a_ops->direct_IO(&kiocb, &from);
if (ret == PAGE_SIZE) {
ret = mapping->a_ops->swap_rw(&kiocb, &from);
if (ret == 0) {
count_vm_event(PSWPOUT);
ret = 0;
} else {
/*
* In the case of swap-over-nfs, this can be a
Expand Down Expand Up @@ -378,10 +377,11 @@ int swap_readpage(struct page *page, bool synchronous)
}

if (data_race(sis->flags & SWP_FS_OPS)) {
struct file *swap_file = sis->swap_file;
struct address_space *mapping = swap_file->f_mapping;
//struct file *swap_file = sis->swap_file;
//struct address_space *mapping = swap_file->f_mapping;

ret = mapping->a_ops->readpage(swap_file, page);
/* This needs to use ->swap_rw() */
ret = -EINVAL;
if (!ret)
count_vm_event(PSWPIN);
goto out;
Expand Down Expand Up @@ -434,17 +434,3 @@ int swap_readpage(struct page *page, bool synchronous)
psi_memstall_leave(&pflags);
return ret;
}

int swap_set_page_dirty(struct page *page)
{
struct swap_info_struct *sis = page_swap_info(page);

if (data_race(sis->flags & SWP_FS_OPS)) {
struct address_space *mapping = sis->swap_file->f_mapping;

VM_BUG_ON_PAGE(!PageSwapCache(page), page);
return mapping->a_ops->set_page_dirty(page);
} else {
return __set_page_dirty_no_writeback(page);
}
}
2 changes: 1 addition & 1 deletion mm/swap_state.c
Expand Up @@ -30,7 +30,7 @@
*/
static const struct address_space_operations swap_aops = {
.writepage = swap_writepage,
.set_page_dirty = swap_set_page_dirty,
.set_page_dirty = __set_page_dirty_no_writeback,
#ifdef CONFIG_MIGRATION
.migratepage = migrate_page,
#endif
Expand Down
10 changes: 3 additions & 7 deletions mm/swapfile.c
Expand Up @@ -2397,13 +2397,9 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)

if (mapping->a_ops->swap_activate) {
ret = mapping->a_ops->swap_activate(sis, swap_file, span);
if (ret >= 0)
sis->flags |= SWP_ACTIVATED;
if (!ret) {
sis->flags |= SWP_FS_OPS;
ret = add_swap_extent(sis, 0, sis->max, 0);
*span = sis->pages;
}
if (ret < 0)
return ret;
sis->flags |= SWP_ACTIVATED;
return ret;
}

Expand Down

0 comments on commit 6443c9d

Please sign in to comment.