Skip to content
This repository was archived by the owner on Nov 8, 2023. It is now read-only.

Commit a3a14ff

Browse files
Tudor AmbarusTreehugger Robot
authored andcommitted
Merge 56e7a8b ("Merge tag 'vfs-6.15-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs") into android-mainline
Steps on the way to v6.15-rc1 Change-Id: Ib4c86291aaa1a9c7c07bc9af30e49559cacba631 Signed-off-by: Tudor Ambarus <tudordana@google.com>
2 parents c12adb1 + 56e7a8b commit a3a14ff

File tree

299 files changed

+9511
-7577
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

299 files changed

+9511
-7577
lines changed

Documentation/filesystems/index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,6 @@ Documentation for filesystem implementations.
118118
spufs/index
119119
squashfs
120120
sysfs
121-
sysv-fs
122121
tmpfs
123122
ubifs
124123
ubifs-authentication

Documentation/filesystems/iomap/design.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,10 @@ The fields are as follows:
246246
* **IOMAP_F_PRIVATE**: Starting with this value, the upper bits can
247247
be set by the filesystem for its own purposes.
248248

249+
* **IOMAP_F_ANON_WRITE**: Indicates that (write) I/O does not have a target
250+
block assigned to it yet and the file system will do that in the bio
251+
submission handler, splitting the I/O as needed.
252+
249253
These flags can be set by iomap itself during file operations.
250254
The filesystem should supply an ``->iomap_end`` function if it needs
251255
to observe these flags:
@@ -352,6 +356,11 @@ operations:
352356
``IOMAP_NOWAIT`` is often set on behalf of ``IOCB_NOWAIT`` or
353357
``RWF_NOWAIT``.
354358

359+
* ``IOMAP_DONTCACHE`` is set when the caller wishes to perform a
360+
buffered file I/O and would like the kernel to drop the pagecache
361+
after the I/O completes, if it isn't already being used by another
362+
thread.
363+
355364
If it is necessary to read existing file contents from a `different
356365
<https://lore.kernel.org/all/20191008071527.29304-9-hch@lst.de/>`_
357366
device or address range on a device, the filesystem should return that

Documentation/filesystems/iomap/operations.rst

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,8 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
131131

132132
* ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``.
133133

134+
* ``IOCB_DONTCACHE``: Turns on ``IOMAP_DONTCACHE``.
135+
134136
Internal per-Folio State
135137
------------------------
136138

@@ -283,7 +285,7 @@ The ``ops`` structure must be specified and is as follows:
283285
struct iomap_writeback_ops {
284286
int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
285287
loff_t offset, unsigned len);
286-
int (*prepare_ioend)(struct iomap_ioend *ioend, int status);
288+
int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
287289
void (*discard_folio)(struct folio *folio, loff_t pos);
288290
};
289291
@@ -306,13 +308,12 @@ The fields are as follows:
306308
purpose.
307309
This function must be supplied by the filesystem.
308310

309-
- ``prepare_ioend``: Enables filesystems to transform the writeback
310-
ioend or perform any other preparatory work before the writeback I/O
311-
is submitted.
311+
- ``submit_ioend``: Allows the file systems to hook into writeback bio
312+
submission.
312313
This might include pre-write space accounting updates, or installing
313314
a custom ``->bi_end_io`` function for internal purposes, such as
314315
deferring the ioend completion to a workqueue to run metadata update
315-
transactions from process context.
316+
transactions from process context before submitting the bio.
316317
This function is optional.
317318

318319
- ``discard_folio``: iomap calls this function after ``->map_blocks``
@@ -341,7 +342,7 @@ This can happen in interrupt or process context, depending on the
341342
storage device.
342343

343344
Filesystems that need to update internal bookkeeping (e.g. unwritten
344-
extent conversions) should provide a ``->prepare_ioend`` function to
345+
extent conversions) should provide a ``->submit_ioend`` function to
345346
set ``struct iomap_end::bio::bi_end_io`` to its own function.
346347
This function should call ``iomap_finish_ioends`` after finishing its
347348
own work (e.g. unwritten extent conversion).
@@ -515,18 +516,33 @@ IOMAP_WRITE`` with any combination of the following enhancements:
515516

516517
* ``IOMAP_ATOMIC``: This write is being issued with torn-write
517518
protection.
518-
Only a single bio can be created for the write, and the write must
519-
not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be
520-
set.
519+
Torn-write protection may be provided based on HW-offload or by a
520+
software mechanism provided by the filesystem.
521+
522+
For HW-offload based support, only a single bio can be created for the
523+
write, and the write must not be split into multiple I/O requests, i.e.
524+
flag REQ_ATOMIC must be set.
521525
The file range to write must be aligned to satisfy the requirements
522526
of both the filesystem and the underlying block device's atomic
523527
commit capabilities.
524528
If filesystem metadata updates are required (e.g. unwritten extent
525-
conversion or copy on write), all updates for the entire file range
529+
conversion or copy-on-write), all updates for the entire file range
526530
must be committed atomically as well.
527-
Only one space mapping is allowed per untorn write.
528-
Untorn writes must be aligned to, and must not be longer than, a
529-
single file block.
531+
Untorn-writes may be longer than a single file block. In all cases,
532+
the mapping start disk block must have at least the same alignment as
533+
the write offset.
534+
The filesystems must set IOMAP_F_ATOMIC_BIO to inform iomap core of an
535+
untorn-write based on HW-offload.
536+
537+
For untorn-writes based on a software mechanism provided by the
538+
filesystem, all the disk block alignment and single bio restrictions
539+
which apply for HW-offload based untorn-writes do not apply.
540+
The mechanism would typically be used as a fallback for when
541+
HW-offload based untorn-writes may not be issued, e.g. the range of the
542+
write covers multiple extents, meaning that it is not possible to issue
543+
a single bio.
544+
All filesystem metadata updates for the entire file range must be
545+
committed atomically as well.
530546

531547
Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
532548
calling this function.

Documentation/filesystems/locking.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ prototypes::
6666
int (*link) (struct dentry *,struct inode *,struct dentry *);
6767
int (*unlink) (struct inode *,struct dentry *);
6868
int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *);
69-
int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
69+
struct dentry *(*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
7070
int (*rmdir) (struct inode *,struct dentry *);
7171
int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t);
7272
int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,

Documentation/filesystems/overlayfs.rst

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -292,21 +292,35 @@ rename or unlink will of course be noticed and handled).
292292
Permission model
293293
----------------
294294

295+
An overlay filesystem stashes credentials that will be used when
296+
accessing lower or upper filesystems.
297+
298+
In the old mount api the credentials of the task calling mount(2) are
299+
stashed. In the new mount api the credentials of the task creating the
300+
superblock through FSCONFIG_CMD_CREATE command of fsconfig(2) are
301+
stashed.
302+
303+
Starting with kernel v6.15 it is possible to use the "override_creds"
304+
mount option which will cause the credentials of the calling task to be
305+
recorded. Note that "override_creds" is only meaningful when used with
306+
the new mount api as the old mount api combines setting options and
307+
superblock creation in a single mount(2) syscall.
308+
295309
Permission checking in the overlay filesystem follows these principles:
296310

297311
1) permission check SHOULD return the same result before and after copy up
298312

299313
2) task creating the overlay mount MUST NOT gain additional privileges
300314

301-
3) non-mounting task MAY gain additional privileges through the overlay,
315+
3) task[*] MAY gain additional privileges through the overlay,
302316
compared to direct access on underlying lower or upper filesystems
303317

304318
This is achieved by performing two permission checks on each access:
305319

306320
a) check if current task is allowed access based on local DAC (owner,
307321
group, mode and posix acl), as well as MAC checks
308322

309-
b) check if mounting task would be allowed real operation on lower or
323+
b) check if stashed credentials would be allowed real operation on lower or
310324
upper layer based on underlying filesystem permissions, again including
311325
MAC checks
312326

@@ -315,10 +329,10 @@ are copied up. On the other hand it can result in server enforced
315329
permissions (used by NFS, for example) being ignored (3).
316330

317331
Check (b) ensures that no task gains permissions to underlying layers that
318-
the mounting task does not have (2). This also means that it is possible
332+
the stashed credentials do not have (2). This also means that it is possible
319333
to create setups where the consistency rule (1) does not hold; normally,
320-
however, the mounting task will have sufficient privileges to perform all
321-
operations.
334+
however, the stashed credentials will have sufficient privileges to
335+
perform all operations.
322336

323337
Another way to demonstrate this model is drawing parallels between::
324338

Documentation/filesystems/porting.rst

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1144,7 +1144,7 @@ and it *must* be opened exclusive.
11441144

11451145
---
11461146

1147-
** mandatory**
1147+
**mandatory**
11481148

11491149
->d_revalidate() gets two extra arguments - inode of parent directory and
11501150
name our dentry is expected to have. Both are stable (dir is pinned in
@@ -1157,3 +1157,49 @@ in normal case it points into the pathname being looked up.
11571157
NOTE: if you need something like full path from the root of filesystem,
11581158
you are still on your own - this assists with simple cases, but it's not
11591159
magic.
1160+
1161+
---
1162+
1163+
**recommended**
1164+
1165+
kern_path_locked() and user_path_locked() no longer return a negative
1166+
dentry so this doesn't need to be checked. If the name cannot be found,
1167+
ERR_PTR(-ENOENT) is returned.
1168+
1169+
---
1170+
1171+
**recommended**
1172+
1173+
lookup_one_qstr_excl() is changed to return errors in more cases, so
1174+
these conditions don't require explicit checks:
1175+
1176+
- if LOOKUP_CREATE is NOT given, then the dentry won't be negative,
1177+
ERR_PTR(-ENOENT) is returned instead
1178+
- if LOOKUP_EXCL IS given, then the dentry won't be positive,
1179+
ERR_PTR(-EEXIST) is rreturned instread
1180+
1181+
LOOKUP_EXCL now means "target must not exist". It can be combined with
1182+
LOOK_CREATE or LOOKUP_RENAME_TARGET.
1183+
1184+
---
1185+
1186+
**mandatory**
1187+
invalidate_inodes() is gone use evict_inodes() instead.
1188+
1189+
---
1190+
1191+
**mandatory**
1192+
1193+
->mkdir() now returns a dentry. If the created inode is found to
1194+
already be in cache and have a dentry (often IS_ROOT()), it will need to
1195+
be spliced into the given name in place of the given dentry. That dentry
1196+
now needs to be returned. If the original dentry is used, NULL should
1197+
be returned. Any error should be returned with ERR_PTR().
1198+
1199+
In general, filesystems which use d_instantiate_new() to install the new
1200+
inode can safely return NULL. Filesystems which may not have an I_NEW inode
1201+
should use d_drop();d_splice_alias() and return the result of the latter.
1202+
1203+
If a positive dentry cannot be returned for some reason, in-kernel
1204+
clients such as cachefiles, nfsd, smb/server may not perform ideally but
1205+
will fail-safe.

0 commit comments

Comments
 (0)