Skip to content

Commit

Permalink
doc/cephfs/posix: update
Browse files Browse the repository at this point in the history
Signed-off-by: Sage Weil <sage@redhat.com>
  • Loading branch information
liewegas committed Dec 14, 2015
1 parent 98e8246 commit 4bc666e
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 16 deletions.
1 change: 1 addition & 0 deletions doc/cephfs/index.rst
Expand Up @@ -81,6 +81,7 @@ authentication keyring.
.. toctree::
:maxdepth: 1

POSIX compatibility <posix>
CephFS Quotas <quota>
Using Ceph with Hadoop <hadoop>
libcephfs <../../api/libcephfs-java/>
Expand Down
37 changes: 37 additions & 0 deletions doc/cephfs/posix.rst
@@ -0,0 +1,37 @@
========================
Differences from POSIX
========================

CephFS aims to adhere to POSIX semantics wherever possible. For
example, in contrast to many other common network file systems like
NFS, CephFS maintains strong cache coherency across clients. The goal
is for processes communicating via the file system to behave the same
when they are on different hosts as when they are on the same host.

However, there are a few places where CephFS diverges from strict
POSIX semantics for various reasons:

- In shared simultaneous writer situations, a write that crosses
object boundaries is not necessarily atomic. This means that you
could have writer A write "aa|aa" and writer B write "bb|bb"
simultaneously (where | is the object boundary), and end up with
"aa|bb" rather than the proper "aa|aa" or "bb|bb".
- POSIX includes the telldir(2) and seekdir(2) system calls that allow
you to obtain the current directory offset and seek back to it.
Because CephFS may refragment directories at any time, it is
difficult to return a stable integer offset for a directory. As
such, a seekdir to a non-zero offset may often work but is not
guaranteed to do so. A seekdir to offset 0 will always work (and is
equivalent to rewinddir(2)).
- Sparse files propagate incorrectly to the stat(2) st_blocks field.
Because CephFS does not explicitly track which parts of a file are
allocated/written, the st_blocks field is always populated by the
file size divided by the block size. This will cause tools like
du(1) to overestimate consumed space. (The recursive size field,
maintained by CephFS, also includes file "holes" in its count.)
- When a file is mapped into memory via mmap(2) on multiple hosts,
writes are not coherently propagated to other clients' caches. That
is, if a page is cached on host A, and then updated on host B, host
A's page is not coherently invalidated. (Shared writable mmap
appears to be quite rare--we have yet to here any complaints about this
behavior, and implementing cache coherency properly is complex.)
16 changes: 0 additions & 16 deletions doc/dev/differences-from-posix.rst

This file was deleted.

0 comments on commit 4bc666e

Please sign in to comment.