Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Apr 08, 2014

  1. Richard Yao

    Improve partition detection on lesser used devices

    The format strings in efi_get_info() are intended to extract both the
    main device and partition number. However, this is only done correctly
    for hd, sd and vd devices. The format strings for ram, dm-, md and loop
    devices misparse the input. This causes the partition device to be
    incorrectly labelled as the main device with the partition being
    labelled 0.
    
    Reported-by: ilovezfs <ilovezfs@icloud.com>
    Signed-off-by: Richard Yao <ryao@gentoo.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #2175
    ryao authored committed

Jan 09, 2014

  1. Brian Behlendorf

    Define the needed ISA types for Sparc

    Add the minimum required ISA types to support the Sparc
    architecture.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Signed-off-by: marku89 <mar42@kola.li>
    Issue #1700
    authored

Dec 19, 2013

  1. Michael Kjörling

    cstyle: Resolve C style issues

    The vast majority of these changes are in Linux specific code.
    They are the result of not having an automated style checker to
    validate the code when it was originally written.  Others were
    caused when the common code was slightly adjusted for Linux.
    
    This patch contains no functional changes.  It only refreshes
    the code to conform to style guide.
    
    Everyone submitting patches for inclusion upstream should now
    run 'make checkstyle' and resolve any warning prior to opening
    a pull request.  The automated builders have been updated to
    fail a build if when 'make checkstyle' detects an issue.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #1821
    mkjorling authored committed

Oct 10, 2013

  1. Richard Yao

    Generate libraries with correct DT_NEEDED entries

    Libraries that depend on other libraries should list them in ELF's
    DT_NEEDED field so that programs linking to them do not need to specify
    those libraries unless they depend on them as well. This is not the case
    in the current code and the consequence is that anything that needs a
    library must know its dependencies. This is fragile and caused GRUB2's
    configure script to break when a dependency was added on libblkid in
    libzfs.
    
    This resolves that problem by using LIBADD/LDADD to specify libraries in
    Makefile.am instead of LDFLAGS. This ensures that proper DT_NEEDED
    entries are generated and prevents GRUB2's configure script from
    breaking in the presence of a libblkid dependency. This also removes
    unneeded dependencies from various files.
    
    Signed-off-by: Richard Yao <ryao@gentoo.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #1751
    ryao authored committed

Aug 27, 2012

  1. Brian Behlendorf

    Remove autotools products

    Remove all of the generated autotools products from the repository
    and update the .gitignore files accordingly.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #718
    authored

Aug 07, 2012

  1. Etienne Dechamps

    Set zvol discard_granularity to the volblocksize.

    Currently, zvols have a discard granularity set to 0, which suggests to
    the upper layer that discard requests of arbirarily small size and
    alignment can be made efficiently.
    
    In practice however, ZFS does not handle unaligned discard requests
    efficiently: indeed, it is unable to free a part of a block. It will
    write zeros to the specified range instead, which is both useless and
    inefficient (see dnode_free_range).
    
    With this patch, zvol block devices expose volblocksize as their discard
    granularity, so the upper layer is aware that it's not supposed to send
    discard requests smaller than volblocksize.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #862
    dechamps authored committed

Jul 23, 2012

  1. Richard Yao

    Linux 3.5 compat, end_writeback() changed to clear_inode()

    The end_writeback() function was changed by moving the call to
    inode_sync_wait() earlier in to evict().   This effecitvely changes
    the ordering of the sync but it does not impact the details of
    the zfs implementation.
    
    However, as part of this change end_writeback() was renamed to
    clear_inode() to reflect the new semantics.  This change does
    impact us and clear_inode() now maps to end_writeback() for
    kernels prior to 3.5.
    
    Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #784
    ryao authored committed
  2. Richard Yao

    Linux 3.5 compat, iops->truncate_range() removed

    The vmtruncate_range() support has been removed from the kernel in
    favor of using the fallocate method in the file_operations table.
    
    Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #784
    ryao authored committed
  3. Richard Yao

    Linux 3.5 compat, eops->encode_fh() takes inodes

    The export_operations member ->encode_fh() has been updated to
    take both the child and parent inodes.  This interface used to
    take the child dentry and a bool describing if the parent is needed.
    
    NOTE: While updating this code I noticed that we do not currently
    cleanly handle the case where we're passed a connectable parent.
    This code should be audited to make sure we're doing the right thing.
    
    Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #784
    ryao authored committed

Jul 17, 2012

  1. Etienne Dechamps

    Move partition scanning from userspace to module.

    Currently, zpool online -e (dynamic vdev expansion) doesn't work on
    whole disks because we're invoking ioctl(BLKRRPART) from userspace
    while ZFS still has a partition open on the disk, which results in
    EBUSY.
    
    This patch moves the BLKRRPART invocation from the zpool utility to the
    module. Specifically, this is done just before opening the device in
    vdev_disk_open() which is called inside vdev_reopen(). This requires
    jumping through some hoops to get to the disk device from the partition
    device, and to make sure we can still open the partition after the
    BLKRRPART call.
    
    Note that this new code path is triggered on dynamic vdev expansion
    only; other actions, like creating a new pool, are unchanged and still
    call BLKRRPART from userspace.
    
    This change also depends on API changes which are available in 2.6.37
    and latter kernels.  The build system has been updated to detect this,
    but there is no compatibility mode for older kernels.  This means that
    online expansion will NOT be available in older kernels.  However, it
    will still be possible to expand the vdev offline.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #808
    dechamps authored committed

Jul 12, 2012

  1. Brian Behlendorf

    Add PowerPC to supported VTOCs

    This code was was inherited from Solaris which was careful to define
    the expected VTOC for various supported architectures.  While this
    check may have made sense there it's something we should be able to
    safely drop under Linux.
    
    However, I'm not quite ready to do that yet.  So for the moment I'm
    just doing the very safe thing of adding PowerPC as a supported type.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    authored
  2. Etienne Dechamps

    Fix efi_use_whole_disk() when efi_nparts == 128.

    Commit e5dc681 changed EFI_NUMPAR from 9 to 128. This means that the
    on-disk EFI label has efi_nparts = 128 instead of 9. The index of the
    reserved partition, however, is still 8. This breaks
    efi_use_whole_disk(), which uses efi_nparts-1 as the index of the
    reserved partition.
    
    This commit fixes efi_use_whole_disk() when the index of the reserved
    partition is not efi_nparts-1. It rewrites the algorithm and makes it
    more robust by using the order of the partitions instead of their
    numbering. It assumes that the last non-empty partition is the reserved
    partition, and that the non-empty partition before that is the data
    partition.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #808
    dechamps authored committed

Jun 11, 2012

  1. Richard Yao

    Linux 3.4 compat, d_make_root() replaces d_alloc_root()

    torvalds/linux@adc0e91 introduced
    introduced d_make_root() as a replacement for d_alloc_root(). Further
    commits appear to have removed d_alloc_root() from the Linux source
    tree. This causes the following failure:
    
      error: implicit declaration of function 'd_alloc_root'
      [-Werror=implicit-function-declaration]
    
    To correct this we update the code to use the current d_make_root()
    interface for readability.  Then we introduce an autotools check
    to determine if d_make_root() is available.  If it isn't then we
    define some compatibility logic which used the older d_alloc_root()
    interface.
    
    Signed-off-by: Richard Yao <ryao@gentoo.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #776
    ryao authored committed

May 03, 2012

  1. lundman

    Define the needed ISA types for ARM

    Add the minimum required ISA types to support the ARM architecture.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    lundman authored committed

Apr 30, 2012

  1. Brian Behlendorf

    Linux 3.3 compat, iops->create()/mkdir()/mknod()

    The mode argument of iops->create()/mkdir()/mknod() was changed from
    an 'int' to a 'umode_t'.  To prevent a compiler warning an autoconf
    check was added to detect the API change and then correctly set a
    zpl_umode_t typedef.  There is no functional change.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #701
    authored

Mar 23, 2012

  1. Brian Behlendorf

    Add --enable-debug-dmu-tx configure option

    Allow rigorous (and expensive) tx validation to be enabled/disabled
    indepentantly from the standard zfs debugging.  When enabled these
    checks ensure that all txs are constructed properly and that a dbuf
    is never dirtied without taking the correct tx hold.
    
    This checking is particularly helpful when adding new dmu consumers
    like Lustre.  However, for established consumers such as the zpl
    with no known outstanding tx construction problems this is just
    overhead.
    
    --enable-debug-dmu-tx  - Enable/disable validation of each tx as
    --disable-debug-dmu-tx   it is constructed.  By default validation
                             is disabled due to performance concerns.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    authored

Mar 22, 2012

  1. Brian Behlendorf

    Add .zfs control directory

    Add support for the .zfs control directory.  This was accomplished
    by leveraging as much of the existing ZFS infrastructure as posible
    and updating it for Linux as required.  The bulk of the core
    functionality is now all there with the following limitations.
    
    *) The .zfs/snapshot directory automount support requires a 2.6.37
       or newer kernel.  The exception is RHEL6.2 which has backported
       the d_automount patches.
    
    *) Creating/destroying/renaming snapshots with mkdir/rmdir/mv
       in the .zfs/snapshot directory works as expected.  However,
       this functionality is only available to root until zfs
       delegations are finished.
    
          * mkdir - create a snapshot
          * rmdir - destroy a snapshot
          * mv    - rename a snapshot
    
    The following issues are known defeciences, but we expect them to
    be addressed by future commits.
    
    *) Add automount support for kernels older the 2.6.37.  This should
       be possible using follow_link() which is what Linux did before.
    
    *) Accessing the .zfs/snapshot directory via NFS is not yet possible.
       The majority of the ground work for this is complete.  However,
       finishing this work will require resolving some lingering
       integration issues with the Linux NFS kernel server.
    
    *) The .zfs/shares directory exists but no futher smb functionality
       has yet been implemented.
    
    Contributions-by: Rohan Puri <rohan.puri15@gmail.com>
    Contributiobs-by: Andrew Barnes <barnes333@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #173
    authored

Feb 27, 2012

  1. Brian Behlendorf

    Cleanly support debug packages

    Allow a source rpm to be rebuilt with debugging enabled.  This
    avoids the need to have to manually modify the spec file.  By
    default debugging is still largely disabled.  To enable specific
    debugging features use the following options with rpmbuild.
    
      '--with debug'               - Enables ASSERTs
    
      # For example:
      $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm
    
    Additionally, ZFS_CONFIG has been added to zfs_config.h for
    packages which build against these headers.  This is critical
    to ensure both zfs and the dependant package are using the same
    prototype and structure definitions.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    authored

Feb 10, 2012

  1. Etienne Dechamps

    Add support for DISCARD to ZVOLs.

    DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning.
    It allows ZVOL clients to discard (unmap, trim) block ranges from
    a ZVOL, thus optimizing disk space usage by allowing a ZVOL to
    shrink instead of just grow.
    
    We can't use zfs_space() or zfs_freesp() here, since these functions
    only work on regular files, not volumes. Fortunately we can use the
    low-level function dmu_free_long_range() which does exactly what we
    want.
    
    Currently the discard operation is not added to the log. That's not
    a big deal since losing discard requests cannot result in data
    corruption. It would however result in disk space usage higher than
    it should be. Thus adding log support to zvol_discard() is probably
    a good idea for a future improvement.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    dechamps authored committed
  2. Etienne Dechamps

    Support the fallocate() file operation.

    Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is
    supported, since it's the only one that matches the behavior of
    zfs_space(). This makes it pretty much useless in its current
    form, but it's a start.
    
    To support other flag combinations we would need to modify
    zfs_space() to make it more flexible, or emulate the desired
    functionality in zpl_fallocate().
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #334
    dechamps authored committed

Feb 08, 2012

  1. Etienne Dechamps

    Improve ZVOL queue behavior.

    The Linux block device queue subsystem exposes a number of configurable
    settings described in Linux block/blk-settings.c. The defaults for these
    settings are tuned for hard drives, and are not optimized for ZVOLs. Proper
    configuration of these options would allow upper layers (I/O scheduler) to
    take better decisions about write merging and ordering.
    
    Detailed rationale:
    
     - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to
       handle writes of any size, so there's no reason to impose a limit. Let the
       upper layer decide.
    
     - max_segments and max_segment_size are set to unlimited. zvol_write() will
       copy the requests' contents into a dbuf anyway, so the number and size of
       the segments are irrelevant. Let the upper layer decide.
    
     - physical_block_size and io_opt are set to the ZVOL's block size. This
       has the potential to somewhat alleviate issue #361 for ZVOLs, by warning
       the upper layers that writes smaller than the volume's block size will be
       slow.
    
     - The NONROT flag is set to indicate this isn't a rotational device.
       Although the backing zpool might be composed of rotational devices, the
       resulting ZVOL often doesn't exhibit the same behavior due to the COW
       mechanisms used by ZFS. Setting this flag will prevent upper layers from
       making useless decisions (such as reordering writes) based on incorrect
       assumptions about the behavior of the ZVOL.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    dechamps authored committed
  2. Etienne Dechamps

    Fix synchronicity for ZVOLs.

    zvol_write() assumes that the write request must be written to stable storage
    if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed,
    "sync" does *not* mean what we think it means in the context of the Linux
    block layer. This is well explained in linux/fs.h:
    
        WRITE:       A normal async write. Device will be plugged.
        WRITE_SYNC:  Synchronous write. Identical to WRITE, but passes down
                     the hint that someone will be waiting on this IO
                     shortly.
        WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush.
        WRITE_FUA:   Like WRITE_SYNC but data is guaranteed to be on
                     non-volatile media on completion.
    
    In other words, SYNC does not *mean* that the write must be on stable storage
    on completion. It just means that someone is waiting on us to complete the
    write request. Thus triggering a ZIL commit for each SYNC write request on a
    ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL
    users have no way to express that they actually want data to be written to
    stable storage, which means the ZIL is broken for ZVOLs.
    
    The request for stable storage is expressed by the FUA flag, so we must
    commit the ZIL after the write if the FUA flag is set. In addition, we must
    commit the ZIL before the write if the FLUSH flag is set.
    
    Also, we must inform the block layer that we actually support FLUSH and FUA.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    dechamps authored committed

Feb 03, 2012

  1. Brian Behlendorf

    Linux 3.3 compat, sops->show_options()

    The second argument of sops->show_options() was changed from a
    'struct vfsmount *' to a 'struct dentry *'.  Add an autoconf check
    to detect the API change and then conditionally define the expected
    interface.  In either case we are only interested in the zfs_sb_t.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #549
    authored

Jan 17, 2012

  1. Darik Horn

    Combine libraries: spl, avl, efi, share, unicode.

    These libraries, which are an artifact of the ZoL development
    process, conflict with packages that are already in distribution:
    
      * libspl: SPL Programming Language
      * libavl: AVL for Linux
      * libefi: GRUB
    
    And these libraries are potential conflicts:
    
      * libshare: the Linux Mount Manager
      * libunicode: Perl and Python
    
    Recompose these five ZoL components into the four libraries that are
    conventionally provided by Solaris and FreeBSD systems:
    
      + libnvpair
      + libuutil
      + libzpool
      + libzfs
    
    This change resolves the name conflict, makes ZoL more compatible
    with existing software that uses autotools to detect ZFS, and allows
    pkg-zfs to better reflect the official Debian kFreeBSD packaging.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes: #430
    dajhorn authored committed

Jan 12, 2012

  1. Richard Laager

    Treat /dev/vd* as whole disks

    Correctly detect /dev/vd devices as whole disks and attempt to
    create an EFI partition table.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    rlaager authored committed

Jan 11, 2012

  1. Brian Behlendorf

    Linux 3.1 compat, super_block->s_shrink

    The Linux 3.1 kernel has introduced the concept of per-filesystem
    shrinkers which are directly assoicated with a super block.  Prior
    to this change there was one shared global shrinker.
    
    The zfs code relied on being able to call the global shrinker when
    the arc_meta_limit was exceeded.  This would cause the VFS to drop
    references on a fraction of the dentries in the dcache.  The ARC
    could then safely reclaim the memory used by these entries and
    honor the arc_meta_limit.  Unfortunately, when per-filesystem
    shrinkers were added the old interfaces were made unavailable.
    
    This change adds support to use the new per-filesystem shrinker
    interface so we can continue to honor the arc_meta_limit.  The
    major benefit of the new interface is that we can now target
    only the zfs filesystem for dentry and inode pruning.  Thus we
    can minimize any impact on the caching of other filesystems.
    
    In the context of making this change several other important
    issues related to managing the ARC were addressed, they include:
    
    * The dnlc_reduce_cache() function which was called by the ARC
    to drop dentries for the Posix layer was replaced with a generic
    zfs_prune_t callback.  The ZPL layer now registers a callback to
    drop these dentries removing a layering violation which dates
    back to the Solaris code.  This callback can also be used by
    other ARC consumers such as Lustre.
    
      arc_add_prune_callback()
      arc_remove_prune_callback()
    
    * The arc_reduce_dnlc_percent module option has been changed to
    arc_meta_prune for clarity.  The dnlc functions are specific to
    Solaris's VFS and have already been largely eliminated already.
    The replacement tunable now represents the number of bytes the
    prune callback will request when invoked.
    
    * Less aggressively invoke the prune callback.  We used to call
    this whenever we exceeded the arc_meta_limit however that's not
    strictly correct since it results in over zeleous reclaim of
    dentries and inodes.  It is now only called once the arc_meta_limit
    is exceeded and every effort has been made to evict other data from
    the ARC cache.
    
    * More promptly manage exceeding the arc_meta_limit.  When reading
    meta data in to the cache if a buffer was unable to be recycled
    notify the arc_reclaim thread to invoke the required prune.
    
    * Added arcstat_prune kstat which is incremented when the ARC
    is forced to request that a consumer prune its cache.  Remember
    this will only occur when the ARC has no other choice.  If it
    can evict buffers safely without invoking the prune callback
    it will.
    
    * This change is also expected to resolve the unexpect collapses
    of the ARC cache.  This would occur because when exceeded just the
    arc_meta_limit reclaim presure would be excerted on the arc_c
    value via arc_shrink().  This effectively shrunk the entire cache
    when really we just needed to reclaim meta data.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #466
    Closes #292
    authored

Dec 17, 2011

  1. Darik Horn

    Linux 3.2 compat: set_nlink()

    Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit
    
      SHA: bfe8684
    
    Use the new set_nlink() kernel function instead.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes: #462
    dajhorn authored committed

Dec 15, 2011

  1. Prakash Surya

    Add make rule for building Arch Linux packages

    Added the necessary build infrastructure for building packages
    compatible with the Arch Linux distribution. As such, one can now run:
    
        $ ./configure
        $ make pkg     # Alternatively, one can run 'make arch' as well
    
    on the Arch Linux machine to create two binary packages compatible with
    the pacman package manager, one for the zfs userland utilities and
    another for the zfs kernel modules. The new packages can then be
    installed by running:
    
        # pacman -U $package.pkg.tar.xz
    
    In addition, source-only packages suitable for an Arch Linux chroot
    environment or remote builder can also be build using the 'sarch' make
    rule.
    
    NOTE: Since the source dist tarball is created on the fly from the head
    of the build tree, it's MD5 hash signature will be continually influx.
    As a result, the md5sum variable was intentionally omitted from the
    PKGBUILD files, and the '--skipinteg' makepkg option is used. This may
    or may not have any serious security implications, as the source tarball
    is not being downloaded from an outside source.
    
    Signed-off-by: Prakash Surya <surya1@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #491
    prakashsurya authored committed

Nov 08, 2011

  1. Brian Behlendorf

    Simplify BDI integration

    Update the code to use the bdi_setup_and_register() helper to
    simplify the bdi integration code.  The updated code now just
    registers the bdi during mount and destroys it during unmount.
    
    The only complication is that for 2.6.32 - 2.6.33 kernels the
    helper wasn't available so in these cases the zfs code must
    provide it.  Luckily the bdi_setup_and_register() function
    is trivial.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #367
    authored

Sep 26, 2011

  1. Zachary Bedell

    Make libefi-created GPT compatible with gptfdisk

    GPT's created by libefi set the HeaderSize attribute in the GPT
    header to 512 -- size of the GPT header INCLUDING the 420 padding
    bytes at the end.  Most other tools set the size to 92 -- size of
    the actual header itself excluding the padding.  Most tools check
    the recorded HeaderSize when verifying CRC, but gptfdisk hardcodes
    92 and thus reports CRC verification problems for full-disk vdevs
    created IE with `zpool create pool sdc`.
    
    This commit changes libefi's behavior for GPT creation and also
    fixes several edge cases where libefi's behavior was similar
    (though in an incompatible manner) to gptfdisk.  Libefi assumed
    HeaderSize was always 512 even if the GPT recorded a different
    value.  Sanity checks of the GPT headersize read from disk were
    added before applying checksum calculation -- this will prevent
    segfault in cases of bogus on-disk values.
    
    Zpools created with the resuling libefi are verified as correct
    both by parted and gptfdisk.  Also pool have been tested to
    import correctly on ZFS on Linux as well as Solaris Express 11
    livecd.
    
    Signed-off-by: Zachary Bedell <zac@thebedells.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #344
    pendor authored committed

Aug 08, 2011

  1. Brian Behlendorf

    Autogen refresh for udev changes

    Run autogen.sh using the same autotools versions as upstream:
    
     * autoconf-2.63
     * automake-1.11.1
     * libtool-2.2.6b
    authored

Aug 04, 2011

  1. Brian Behlendorf

    Add backing_device_info per-filesystem

    For a long time now the kernel has been moving away from using the
    pdflush daemon to write 'old' dirty pages to disk.  The primary reason
    for this is because the pdflush daemon is single threaded and can be
    a limiting factor for performance.  Since pdflush sequentially walks
    the dirty inode list for each super block any delay in processing can
    slow down dirty page writeback for all filesystems.
    
    The replacement for pdflush is called bdi (backing device info).  The
    bdi system involves creating a per-filesystem control structure each
    with its own private sets of queues to manage writeback.  The advantage
    is greater parallelism which improves performance and prevents a single
    filesystem from slowing writeback to the others.
    
    For a long time both systems co-existed in the kernel so it wasn't
    strictly required to implement the bdi scheme.  However, as of
    Linux 2.6.36 kernels the pdflush functionality has been retired.
    
    Since ZFS already bypasses the page cache for most I/O this is only
    an issue for mmap(2) writes which must go through the page cache.
    Even then adding this missing support for newer kernels was overlooked
    because there are other mechanisms which can trigger writeback.
    
    However, there is one critical case where not implementing the bdi
    functionality can cause problems.  If an application handles a page
    fault it can enter the balance_dirty_pages() callpath.  This will
    result in the application hanging until the number of dirty pages in
    the system drops below the dirty ratio.
    
    Without a registered backing_device_info for the filesystem the
    dirty pages will not get written out.  Thus the application will hang.
    As mentioned above this was less of an issue with older kernels because
    pdflush would eventually write out the dirty pages.
    
    This change adds a backing_device_info structure to the zfs_sb_t
    which is already allocated per-super block.  It is then registered
    when the filesystem mounted and unregistered on unmount.  It will
    not be registered for mounted snapshots which are read-only.  This
    change will result in flush-<pool> thread being dynamically created
    and destroyed per-mounted filesystem for writeback.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #174
    authored

Jul 11, 2011

  1. Kyle Fuller

    Provide a rc.d script for archlinux

    Unlike most other Linux distributions archlinux installs its
    init scripts in /etc/rc.d insead of /etc/init.d.  This commit
    provides an archlinux rc.d script for zfs and extends the
    build infrastructure to ensure it get's installed in the
    correct place.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #322
    kylef authored committed

Jul 06, 2011

  1. Brian Behlendorf

    Add proper library versioning

    The zfs libraries were never properly versioned.  Since the API has
    remained static for quite some time this we never an issue.  However,
    going forward they should be versioned.  This commit versions all
    of the libraries to 1.0.0.  From here on out this version must be
    updated to reflect changes to the library.
    authored

Jul 01, 2011

  1. Brian Behlendorf

    Linux compat 2.6.39: mount_nodev()

    The .get_sb callback has been replaced by a .mount callback
    in the file_system_type structure.  When using the new
    interface the caller must now use the mount_nodev() helper.
    
    Unfortunately, the new interface no longer passes the vfsmount
    down to the zfs layers.  This poses a problem for the existing
    implementation because we currently save this pointer in the
    super block for latter use.  It provides our only entry point
    in to the namespace layer for manipulating certain mount options.
    
    This needed to be done originally to allow commands like
    'zfs set atime=off tank' to work properly.  It also allowed me
    to keep more of the original Solaris code unmodified.  Under
    Solaris there is a 1-to-1 mapping between a mount point and a
    file system so this is a fairly natural thing to do.  However,
    under Linux they many be multiple entries in the namespace
    which reference the same filesystem.  Thus keeping a back
    reference from the filesystem to the namespace is complicated.
    
    Rather than introduce some ugly hack to get the vfsmount and
    continue as before.  I'm leveraging this API change to update
    the ZFS code to do things in a more natural way for Linux.
    This has the upside that is resolves the compatibility issue
    for the long term and fixes several other minor bugs which
    have been reported.
    
    This commit updates the code to remove this vfsmount back
    reference entirely.  All modifications to filesystem mount
    options are now passed in to the kernel via a '-o remount'.
    This is the expected Linux mechanism and allows the namespace
    to properly handle any options which apply to it before passing
    them on to the file system itself.
    
    Aside from fixing the compatibility issue, removing the
    vfsmount has had the benefit of simplifying the code.  This
    change which fairly involved has turned out nicely.
    
    Closes #246
    Closes #217
    Closes #187
    Closes #248
    Closes #231
    authored
Something went wrong with that request. Please try again.