Commits on Sep 8, 2020

  1. LU-13553 lnd: gracefully handle unexpected events

    When a tx completes kiblnd_tx_complete() callback is invoked.
    We ensure:
    LASSERT (tx->tx_sending > 0);
    However this assert is being triggered in some rare scenarios.
    The reason tx_sending would be 0 at this point is because:
     1. ib_post_send() failed but OFED stack is still sending
        a tx complete event.
     2. We're getting two different events for the same tx
    
    Instead of asserting, ignore that tx_complete event and print
    the tx pointer and its status.
    
    Lustre-change: https://review.whamcloud.com/38669
    Lustre-commit: 60f9f53
    
    Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
    Change-Id: I8cd192538c0c80abaef23a4b6e6906936043060b
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
    Signed-off-by: Minh Diep <mdiep@whamcloud.com>
    Reviewed-on: https://review.whamcloud.com/38752
    Tested-by: jenkins <devops@whamcloud.com>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Amir Shehata authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    c292aa5 View commit details
    Browse the repository at this point in the history
  2. LU-13600 ptlrpc: limit rate of lock replays

    Clients send all lock replays at once and that may overwhelm
    server with huge amount of replays in recovery queue causing
    OOM effects.
    
    Patch adds rate control for lock replays on client.
    
    Patch includes also later fix for signal_completed_replay()
    race.
    
    Lustre-change: https://review.whamcloud.com/38920
    Lustre-commit: 3b613a4
    
    Lustre-change: https://review.whamcloud.com/39140
    Lustre-commit: dc654756af63bd30802ebd86074019d1533a4d8f
    
    Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
    Change-Id: Ie557f8481c5facb690468d7136cf5feebe4e8f11
    Reviewed-on: https://review.whamcloud.com/39111
    Tested-by: jenkins <devops@whamcloud.com>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Mikhail Pershin authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    6d020da View commit details
    Browse the repository at this point in the history
  3. LU-13088 ldlm: Fix sleeping function called in atomic

    target_recovery_overseer() can sleep while holding a spinlock, which
    triggers a BUG warning.
    
    It is easily fixed by dropping the spinlock before waiting.  In the
    case where the task waits, no useful information that could be
    protected by the spinlock is held, so nothing can be lost by dropping
    it.
    
    Lustre-change: https://review.whamcloud.com/#/c/37063/
    Lustre-commit: b29b931
    
    Signed-off-by: Mr NeilBrown <neilb@suse.de>
    Change-Id: I8bb3d02523b5dcfadac19f01ccb736d7b7f28239
    Reviewed-on: https://review.whamcloud.com/37063
    Tested-by: jenkins <devops@whamcloud.com>
    Reviewed-by: James Simmons <jsimmons@infradead.org>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Reviewed-on: https://review.whamcloud.com/39283
    neilbrown authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    56f2391 View commit details
    Browse the repository at this point in the history
  4. LU-12424 lnet: prevent loop in LNetPrimaryNID()

    If discovery is disabled locally or at the remote end, then attempt
    discovery only once. Do not update the internal database when
    discovery is disabled and do not repeat discovery.
    
    This change prevents LNet from getting hung waiting for
    discovery to complete.
    
    Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
    Change-Id: I4543b0f71e6cf297a1a5f058ebcc6bf74b8ac328
    Reviewed-on: https://review.whamcloud.com/35191
    Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
    Tested-by: Jenkins
    Reviewed-by: Chris Horn <hornc@cray.com>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Reviewed-on: https://review.whamcloud.com/38890
    Reviewed-by: Chris Horn <chris.horn@hpe.com>
    Tested-by: jenkins <devops@whamcloud.com>
    Amir Shehata authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    2e463b3 View commit details
    Browse the repository at this point in the history
  5. LU-12222 lnet: Introduce constant for the lolnd NID

    This patch adds a new constant, LNET_NID_LO_0, to represent the lolnd
    NID 0@lo.
    
    Lustre-change: https://review.whamcloud.com/38312
    Lustre-commit: 56203e4
    
    HPE-bug-id: LUS-8457
    Signed-off-by: Chris Horn <hornc@cray.com>
    Change-Id: I3e57637f297b8de306905a447af8f025e31d1fcf
    Reviewed-on: https://review.whamcloud.com/38863
    Tested-by: jenkins <devops@whamcloud.com>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Chris Horn authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    235bbb3 View commit details
    Browse the repository at this point in the history
  6. LU-12222 lnet: Primary NID of lolnd NID is the lolnd NID

    We want Lustre traffic that is intended for the local peer to be sent
    and received over the lolnd. The function ptlrpc_uuid_to_peer() will
    currently resolve a NID to the lolnd NID, but ptlrpc_connection_get()
    will overwrite this selection with the result from LNetPrimaryNID().
    
    Have LNetPrimaryNID return the lolnd NID when it is passed the lolnd
    NID.
    
    Lustre-change: https://review.whamcloud.com/38313
    Lustre-commit: 33d2e44
    
    HPE-bug-id: LUS-8457
    Signed-off-by: Chris Horn <hornc@cray.com>
    Change-Id: I02708bb45f8440091782ca7886bac7656efb0223
    Reviewed-on: https://review.whamcloud.com/38864
    Tested-by: jenkins <devops@whamcloud.com>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Chris Horn authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    2d6323c View commit details
    Browse the repository at this point in the history
  7. LU-12222 ptlrpc: Check if NID is local, not just lolnd NID

    There's a couple places where we check whether a NID is the lolnd NID
    but we really want to know whether the NID is local. Use
    LNetIsPeerLocal() to accomplish this.
    
    Lustre-change: https://review.whamcloud.com/38388
    Lustre-commit: 95bcc24
    
    Signed-off-by: Chris Horn <hornc@cray.com>
    Change-Id: Ia17b9b4b54fd1063c42a6f8bdd0e593be1086683
    Reviewed-on: https://review.whamcloud.com/38865
    Tested-by: jenkins <devops@whamcloud.com>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Chris Horn authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    5abcac0 View commit details
    Browse the repository at this point in the history
  8. LU-9971 lnet: use after free in lnet_discover_peer_locked()

    When the lnet_net_lock is unlocked, the peer attached to an
    lnet_peer_ni (found via lnet_peer_ni::lpni_peer_net->lpn_peer)
    can change, and the old peer deallocated. If we are really
    unlucky, then all the churn could give us a new, different,
    peer at the same address in memory.
    
    Change the reference counting on the lnet_peer lp so that it
    is guaranteed to be alive when we relock the lnet_net_lock for
    the cpt. When the reference count is dropped lp may go away if
    it was unlinked, but the new peer is guaranteed to have a
    different address, so we can still correctly determine whether
    the peer changed and discovery should be redone.
    
    LU-9971 lnet: fix peer ref counting
    
    Exit from the loop after peer ref count has been incremented
    to avoid wrong ref count.
    
    The code makes sure that a peer is queued for discovery at most
    once if discovery is disabled. This is done to use discovery
    as a standard ping for gateways which do not have discovery feature
    or discovery is disabled.
    
    Signed-off-by: Olaf Weber <olaf.weber@hpe.com>
    Change-Id: Ia44dce20074b27ec0e77d7c1908c6a44ec73d326
    Reviewed-on: https://review.whamcloud.com/28944
    Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
    Tested-by: Jenkins
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: James Simmons <uja.ornl@yahoo.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Reviewed-on: https://review.whamcloud.com/38891
    Tested-by: jenkins <devops@whamcloud.com>
    Reviewed-by: Chris Horn <chris.horn@hpe.com>
    Reviewed-by: James Simmons <jsimmons@infradead.org>
    Olaf Weber authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    768ef4e View commit details
    Browse the repository at this point in the history
  9. LU-13278 lnet: Reconcile discovery push and reply handling

    Reconcile the logic for updating the multi-rail flag of a peer when
    processing a discovery PUSH with the logic used when processing a
    discovery REPLY.
    
    Cray-bug-id: LUS-8516
    Signed-off-by: Chris Horn <hornc@cray.com>
    Change-Id: Idfb4c3729822d03b71f9440ac66176ae6b886022
    Reviewed-on: https://review.whamcloud.com/37674
    Tested-by: jenkins <devops@whamcloud.com>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
    Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
    Reviewed-by: Stephen Champion <stephen.champion@hpe.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Reviewed-on: https://review.whamcloud.com/39575
    Reviewed-by: Chris Horn <chris.horn@hpe.com>
    Chris Horn authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    74603d9 View commit details
    Browse the repository at this point in the history
  10. LU-13763 osc: don't allow negative grants

    Add check in the osc_init_grant() to prevent possible
    underflow of cl_avail_grant and report error if it happens
    
    Cherry-picked-from: c84fdeb5469bdc507caf1b4d5b876fce47cf5d86
    Cherry-picked-from-change: https://review.whamcloud.com/#/c/39380
    Cherry-picked-from-patch: 2
    Cherry-picked-from-branch: b2_12
    Cherry-picked-from-status: Maloo +1, Review +1
    
    Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
    Change-Id: Idcd25ed427c23735e1cdc70359bace43b5b9d886
    Mikhail Pershin authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    ef65452 View commit details
    Browse the repository at this point in the history
  11. LU-12687 osc: consume grants for direct I/O

    New IO engine implementation lost consuming grants by direct I/O
    writes. That led to early emergence of out of space condition during
    direct I/O. The below illustrates the problem:
      # OSTSIZE=100000 sh llmount.sh
      # dd if=/dev/zero of=/mnt/lustre/file bs=4k count=100 oflag=direct
      dd: error writing ‘/mnt/lustre/file’: No space left on device
    
    Consume grants for direct I/O.
    
    Try to consume grants in osc_queue_sync_pages() when it is called for
    pages which are being writted in direct i/o.
    
    Tests are added to verify grant consumption in buffered and direct i/o
    and to verify direct i/o overwrite when ost is full.
    The overwrite test is for ldiskfs only as zfs is unable to overwrite
    when it is full.
    
    Cherry-picked-from: 8dd02c50f7df6fd0af9ead75d5d7774f32c211e2
    Cherry-picked-from-change: https://review.whamcloud.com/#/c/39386
    Cherry-picked-from-patch: 10
    Cherry-picked-from-branch: b2_12
    Cherry-picked-from-status: Maloo +1, Review +1
    
    Lustre-change: https://review.whamcloud.com/35896
    Lustre-commit: 05f326a
    
    Fixes: 9fe4b52 ("LU-1030 osc: new IO engine implementation")
    Signed-off-by: Vladimir Saveliev <c17830@cray.com>
    Change-Id: I9a199452c564e8e8ad02f79231e8481166f3666e
    Cray-bug-id: LUS-7036
    Vladimir Saveliev authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    2fcd7cf View commit details
    Browse the repository at this point in the history
  12. LU-13089 osc: revert "glimpse - search for active lock"

    Revert "LU-11670 osc: glimpse - search for active lock"
    
    This could cause assertion failures like below:
    
    LustreError: 13759:0:(ldlm_lock.c:213:ldlm_lock_put())
    		ASSERTION((((( lock))->l_flags & (1ULL << 50)) != 0) ) failed:
    LustreError: 10188:0:(ldlm_lock.c:205:ldlm_lock_put())
    		ASSERTION( atomic_read(&lock->l_refc) > 0 ) failed:
    LustreError: 10188:0:(ldlm_lock.c:205:ldlm_lock_put()) LBUG
    
    A glimpse cb race with cancel cb.
    
    This reverts commit 2548cb9
    
    Conflicts:
    	lustre/tests/sanityn.sh
    
    Cherry-picked-from: 37205c60bb2d99363a0c9dbf29d8f4fd684b6fab
    Cherry-picked-from-change: https://review.whamcloud.com/#/c/39819
    Cherry-picked-from-patch: 1
    Cherry-picked-from-branch: b2_12
    Cherry-picked-from-status: Maloo +1, No Review
    Cherry-picked-from-notes: Removed test_103
    
    Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
    Change-Id: I12063d0b3f1411e0d44393823a3e220cea6567d5
    Bobi Jam authored and ofaaland committed Sep 8, 2020
    Copy the full SHA
    dc9aebb View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2020

  1. Revert "LU-12687 osc: consume grants for direct I/O"

    This reverts commit 2fcd7cf.
    
    Upstream has seen intermittent failures in testing
    on b2_12, consistent with the test failures seen on
    this patch.
    
    Reverting while the failures are investigated.
    
    Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
    ofaaland committed Sep 29, 2020
    Copy the full SHA
    21734c7 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2020

  1. LU-12687 osc: consume grants for direct I/O

    New IO engine implementation lost consuming grants by direct I/O
    writes. That led to early emergence of out of space condition during
    direct I/O. The below illustrates the problem:
      # OSTSIZE=100000 sh llmount.sh
      # dd if=/dev/zero of=/mnt/lustre/file bs=4k count=100 oflag=direct
      dd: error writing ‘/mnt/lustre/file’: No space left on device
    
    Consume grants for direct I/O.
    
    Try to consume grants in osc_queue_sync_pages() when it is called for
    pages which are being writted in direct i/o.
    
    Tests are added to verify grant consumption in buffered and direct i/o
    and to verify direct i/o overwrite when ost is full.
    The overwrite test is for ldiskfs only as zfs is unable to overwrite
    when it is full.
    
    Cherry-picked-from: 8dd02c50f7df6fd0af9ead75d5d7774f32c211e2
    Cherry-picked-from-change: https://review.whamcloud.com/#/c/39386
    Cherry-picked-from-patch: 10
    Cherry-picked-from-branch: b2_12
    Cherry-picked-from-status: Maloo +1, Review +1
    
    Lustre-change: https://review.whamcloud.com/35896
    Lustre-commit: 05f326a
    
    Fixes: 9fe4b52 ("LU-1030 osc: new IO engine implementation")
    Signed-off-by: Vladimir Saveliev <c17830@cray.com>
    Change-Id: I9a199452c564e8e8ad02f79231e8481166f3666e
    Cray-bug-id: LUS-7036
    Vladimir Saveliev authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    663e688 View commit details
    Browse the repository at this point in the history
  2. LU-13653 mdt: ignore quota when creating slave stripe

    When creating striped directory, the quota limit has been checked
    on master MDT, the quota should be ignored when creating the slave
    stripe object.
    
    Cherry-picked-from: 475f891
    Cherry-picked-from-change: https://review.whamcloud.com/39282
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/#/c/38875/
    Lustre-commit: f762ace
    
    Change-Id: Ia53b1975a8d66c78725feb313659f7a9b889e735
    Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
    Hongchao Zhang authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    0ab1c59 View commit details
    Browse the repository at this point in the history
  3. LU-12758 quota: clear default flag for new ID

    When setting the quota limits as 0 by "lfs setquota", the default
    flag won't be cleared if the lquota_entry is just created for some
    quota ID at the first time because the quota limits are the same.
    
    Cherry-picked-from: hash 07aa659
    Cherry-picked-from-change: https://review.whamcloud.com/38808
    Cherry-picked-from-branch: b2_12
    
    This patch is back-ported from the following one:
    Lustre-commit: ce86e23
    Lustre-change: https://review.whamcloud.com/36236
    
    Change-Id: I7f44ce0cb13783ca5bede2f55cd0707f1ccbc8ca
    Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
    Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Hongchao Zhang authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    1a87281 View commit details
    Browse the repository at this point in the history
  4. LU-13742 llite: do not bypass selinux xattr handling

    Without the hint from selinux_is_enabled() to determine if selinux
    is running at boot the performance fix from LU-549 to skip handling
    of selinux xattrs cannot be correctly handled.
    
    The correct path is to act is if selinux is enabled.
    
    This fixes a bug introduced by LU-12355 that now exists in
    RHEL 8.2 kernels where clients have enabled selinux.
    
    Cherry-picked-from: d657c96
    Cherry-picked-from-change: https://review.whamcloud.com/39671
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/39569
    Lustre-commit: 994287b
    
    Fixes: 39e5bfa ("LU-12355 llite: include file linux/selinux.h removed")
    Test-Parameters: clientdistro=el8.2 serverdistro=el8.2 clientselinux testlist=sanity-selinux
    Test-Parameters: clientdistro=el8.1 serverdistro=el8.1 clientselinux testlist=sanity-selinux
    Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
    Change-Id: I6fb5ed9ecdb79545225b5586b90509eb157a355b
    Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Signed-off-by: Minh Diep <mdiep@whamcloud.com>
    Shaun Tancheff authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    bbd30ea View commit details
    Browse the repository at this point in the history
  5. LU-13761 o2ib: Fix compilation with MOFED 5.1

    A new argument was added to rdma_reject() in MOFED 5.1 and
    Linux 5.8.
    
    Add a cofigure check and support both versions of rdma_reject().
    
    Cherry-picked-from: ba702c7
    Cherry-picked-from-change: https://review.whamcloud.com/39781
    Cherry-picked-from-branch: b2_12
    
    Lustre-commit: 956deb0
    Lustre-change: https://review.whamcloud.com/39323
    
    Test-Parameters: trivial
    Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
    Change-Id: I2b28991f335658b651b21a09899b7b17ab2a9d57
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    SergeyGorenko authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    5069aa0 View commit details
    Browse the repository at this point in the history
  6. LU-13187 osd-ldiskfs: don't enforce max dir size limit on IAM objects

    Add ext4-no-max-dir-size-limit-for-iam-objects.patch to introduce new
    inode state EXT4_STATE_IAM and use it to mark IAM objects.
    
    Cherry-picked-from: a73f4e5
    Cherry-picked-from-change: https://review.whamcloud.com/39882
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/39823
    Lustre-commit: 03e6db5
    
    Change-Id: I3bcc5435ea07edb9fa265dcd8e3261d849495f00
    Signed-off-by: Li Dongyang <dongyangli@ddn.com>
    Reviewed-by: Neil Brown <neilb@suse.de>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: James Simmons <jsimmons@infradead.org>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Li Dongyang authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    26c171e View commit details
    Browse the repository at this point in the history
  7. LU-13907 llite: don't set FS_REQUIRES_DEV on client

    If doing a client-only build, do not set the FS_REQUIRES_DEV flag
    for the 'lustre' filesystem type.  This is only needed on the server,
    but the filesystem type declaration is shared between both.
    
    In master, this was fixed by declaring a new 'lustre_tgt' filesystem
    type and using that for server filesystem mounts.  However, for 2.12
    this is overkill, and it is possible to get a 95% fix by dropping
    the FS_REQUIRES_DEV flag for the common case of client-only builds.
    
    Cherry-picked-from: 76531b7
    Cherry-picked-from-change: https://review.whamcloud.com/39674
    Cherry-picked-from-branch: b2_12
    
    Test-Parameters: trivial
    Signed-off-by: Andreas Dilger <adilger@dilger.ca>
    Change-Id: Iab2e78515aba018e2a6bceb324ad1b8a313ebbe5
    Reviewed-by: Jian Yu <yujian@whamcloud.com>
    Reviewed-by: James Simmons <jsimmons@infradead.org>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    adilger authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    7b17999 View commit details
    Browse the repository at this point in the history
  8. LU-13471 lnet: use the same src nid for discovery

    When discovering a remote peer (not on the same network) a GET is
    sent to the peer to retrieve the peer's interfaces.  This is followed
    by a PUSH, if discovery is on, to push the node's interfaces However,
    if both node and peer have multiple interfaces it is likely that the
    GET and the PUSH will originate on different interfaces. When the
    peer receives the PUSH it will not be able to connect the two NIDs
    and will not be able to consolidate the node's NIDs.  This issue is
    specific for remote peers because at the time the push handler is
    invoked the remote lpni has not been created yet. lnet_parse()
    creates the lpni of the gateway.
    
    Similar to the strategy already in place of using the same source NID
    for all the messages of an RPC, discovery should use the same source
    NID for both the GET and PUSH.
    
    This patch stores the source NID interfaces the GET was sent on and
    uses it for the PUSH.
    
    Cherry-picked-from: b4b93d2 LU-13471 lnet: use the same src nid for discovery
    Cherry-picked-from-change: https://review.whamcloud.com/39576
    Cherry-picked-from-branch: b2_12
    
    Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
    Change-Id: I5a13ab7799b2ddc47714202bcbed786b0d3940b7
    Reviewed-by: Chris Horn <chris.horn@hpe.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Amir Shehata authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    b37725b View commit details
    Browse the repository at this point in the history
  9. LU-13437 lmv: check stripe FID sanity

    Striped directory layout may be broken, if some stripe FID is insane,
    return -ENODEV.
    
    Cherry-picked-from: f1712b3
    Cherry-picked-from-change: https://review.whamcloud.com/39600
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/38560
    Lustre-commit: 698a496
    
    Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
    Change-Id: I7ed8c7c561e34625e2cb29bfd14bc0ecf3fce46c
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
    Signed-off-by: Minh Diep <mdiep@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Lai Siyao authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    338d34e View commit details
    Browse the repository at this point in the history
  10. LU-13437 mdt: don't fetch LOOKUP lock for remote object

    Pack parent FID in getattr by FID, which will be used to check whether
    child is remote object on parent. The helper function is called
    mdt_is_remote_object(). NB, directory shard is not treated as remote
    object, because if so, client needs to revalidate shards when dir is
    accessed, which will hurt performance much.
    
    For getattr by FID, if object is remote file on parent, don't fetch
    LOOKUP lock, otherwise client may see stale dir entries.
    
    Cherry-picked-from: ae9fc81
    Cherry-picked-from-change: https://review.whamcloud.com/39769
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/38561
    Lustre-commit: f9a2da6
    
    Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Yingjin Qian <qian@ddn.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Change-Id: I37b36983735eca63da37f190456b5cc1b861b29e
    Lai Siyao authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    8d212e8 View commit details
    Browse the repository at this point in the history
  11. LU-13437 mdt: rename misses remote LOOKUP lock revoke

    In rename, all objects but target may be remote, so to check whether
    source is remote object on source parent, we need to compare which
    MDTs they are located if both are remote. Add a helper function
    mdt_rename_source_lock() to handle all possible combinations. If target
    parent is remote, take remote LOOKUP for target on where target parent
    is.
    
    Add sanityn.sh 81c.
    
    Cherry-picked-from: 23fa920
    Cherry-picked-from-change: https://review.whamcloud.com/39601
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/38181
    Lustre-commit: 4918fe4
    
    Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
    Change-Id: I2c134970d6abc8761528d01950b23495292cdf93
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
    Signed-off-by: Minh Diep <mdiep@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Lai Siyao authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    6dc8f51 View commit details
    Browse the repository at this point in the history
  12. LU-13437 uapi: add OBD_CONNECT2_GETATTR_PFID

    Add OBD_CONNECT2_GETATTR_PFID connect flag to pack parent FID in
    getattr request, which will be used to check whether target is
    remote object, if so, don't take LOOKUP lock, otherwise client
    may see stale directory entries.
    
    Cherry-picked-from: daa9148
    Cherry-picked-from-change: https://review.whamcloud.com/39770
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/39289
    Lustre-commit: f384a87
    Test-parameters: trivial
    
    Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Neil Brown <neilb@suse.de>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Change-Id: Ibdf880934456f255f83cd4bac9d61ab5e1ed7330
    Lai Siyao authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    ddec375 View commit details
    Browse the repository at this point in the history
  13. LU-13437 llite: pack parent FID in getattr

    Pack parent FID in getattr request if OBD_CONNECT2_GETATTR_PFID is
    enabled, otherwise fill it with target FID for backward compatibility.
    
    Cherry-picked-from: 3314727
    Cherry-picked-from-change: https://review.whamcloud.com/39771
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/39290
    Lustre-commit: 5f2c44b
    
    Fixes: f9a2da6 ("LU-13437 mdt: don't fetch LOOKUP lock for remot...")
    Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
    Reviewed-by: Neil Brown <neilb@suse.de>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Change-Id: I91bace23e67b548feb92fd885fb5e64e92c96408
    Lai Siyao authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    d06cc90 View commit details
    Browse the repository at this point in the history
  14. LU-13608 tgt: abort recovery while reading update llog

    Abort the reading update LLOG fromt other MDTs when the recovery
    is aborted, then the recovery process can be aborted in time.
    
    This patch also adds watchdog for the process of the replay request
    to detect possible stale process.
    
    Cherry-picked-from: 4142f05
    Cherry-picked-from-change: https://review.whamcloud.com/39284
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/38746
    Lustre-commit: 0496cdf
    
    Change-Id: Ie2de041360c9eba95ef9bfd14b00ac2709e6eace
    Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
    Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Hongchao Zhang authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    bd18e40 View commit details
    Browse the repository at this point in the history
  15. LU-12820 osc: remove 'transient' arg from osc_enter_cache_try

    This arg is always '0', so remove it.
    Consequently, OBD_BRW_NOCACHE is never set, and
    cl_dirty_transit and obd_dirty_transit_pages
    are never non-zero, so they can be removed as well.
    
    Cherry-picked-from: f92c7a1
    Cherry-picked-from-change: https://review.whamcloud.com/39518
    Cherry-picked-from-branch: b2_12
    
    Lustre-change: https://review.whamcloud.com/36319
    Lustre-commit: 524deb6
    
    Patch also includes changes for atomic ops optimization
    to keep in sync with master branch:
    
    Lustre-change: https://review.whamcloud.com/33859
    Lustre-commit: 8b364fb
    
    Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
    Change-Id: Ia047affc33fb9277e6c28a8f6d7d088c385b51a8
    Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    neilbrown authored and ofaaland committed Oct 10, 2020
    Copy the full SHA
    d54d3ca View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2020

  1. LU-13590 kernel: new kernel [RHEL 7.9 3.10.0-1160.2.1.el7]

    This patch makes changes to support new RHEL 7.9 release
    for Lustre client.
    
    Cherry-picked-by: Olaf Faaland <faaland1@llnl.gov>
    Cherry-picked-from: 4f65883699f57b9293a4d21f475f96797ce2a757
    Cherry-picked-from-change: https://review.whamcloud.com/#/c/40177/1
    Cherry-picked-from-reviews: +1 Maloo, +1 Jenkins, +2 Code-Reviews
    Cherry-picked-from-branch: b2_12
    
    Test-Parameters: trivial clientdistro=el7.9
    Change-Id: I7a2846de48a6710d6d720d6ccc3176dba4afc6bb
    Signed-off-by: Jian Yu <yujian@whamcloud.com>
    Jian Yu authored and ofaaland committed Oct 16, 2020
    Copy the full SHA
    178609c View commit details
    Browse the repository at this point in the history
  2. LU-13590 kernel: RHEL 7.9 server support

    This patch makes changes to support new RHEL 7.9 release
    for Lustre server (kernel 3.10.0-1160.2.1.el7).
    
    Cherry-picked-by: Olaf Faaland <faaland1@llnl.gov>
    Cherry-picked-from: 66703259f56edac4eaffde53a4363fcc90dcee79
    Cherry-picked-from-change: https://review.whamcloud.com/#/c/40224/1/
    Cherry-picked-from-reviews: +1 Maloo, +1 Jenkins
    Cherry-picked-from-branch: b2_12
    
    Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9
    Change-Id: I7653091f2bd6a579447edb12045984d2829a8235
    Signed-off-by: Jian Yu <yujian@whamcloud.com>
    Jian Yu authored and ofaaland committed Oct 16, 2020
    Copy the full SHA
    121a1b8 View commit details
    Browse the repository at this point in the history

Commits on Oct 22, 2020

  1. LU-13892 lnet: lock-up during router check

    This is a fix for the issue with LNet lock-up while waiting
    for routers to become active with check_routers_before_use
    option. Release ln_api_mutex while waiting to allow
    incoming connections to be handled.
    
    Cherry-picked-from: 877d95b
    Cherry-picked-from-branch: b2_12
    
    Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
    Change-Id: I63b1d1ce5ee2b27a3bd2cea78713fc6fc7502cf7
    Reviewed-on: https://review.whamcloud.com/40172
    Tested-by: jenkins <devops@whamcloud.com>
    Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Serguei Smirnov authored and defaziogiancarlo committed Oct 22, 2020
    Copy the full SHA
    85ffcf7 View commit details
    Browse the repository at this point in the history