LU-13553 lnd: gracefully handle unexpected events
When a tx completes kiblnd_tx_complete() callback is invoked.
We ensure:
LASSERT (tx->tx_sending > 0);
However this assert is being triggered in some rare scenarios.
The reason tx_sending would be 0 at this point is because:
1. ib_post_send() failed but OFED stack is still sending
a tx complete event.
2. We're getting two different events for the same tx
Instead of asserting, ignore that tx_complete event and print
the tx pointer and its status.
Lustre-change: https://review.whamcloud.com/38669
Lustre-commit: 60f9f53
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I8cd192538c0c80abaef23a4b6e6906936043060b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38752
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>LU-13600 ptlrpc: limit rate of lock replays
Clients send all lock replays at once and that may overwhelm server with huge amount of replays in recovery queue causing OOM effects. Patch adds rate control for lock replays on client. Patch includes also later fix for signal_completed_replay() race. Lustre-change: https://review.whamcloud.com/38920 Lustre-commit: 3b613a4 Lustre-change: https://review.whamcloud.com/39140 Lustre-commit: dc654756af63bd30802ebd86074019d1533a4d8f Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: Ie557f8481c5facb690468d7136cf5feebe4e8f11 Reviewed-on: https://review.whamcloud.com/39111 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-13088 ldlm: Fix sleeping function called in atomic
target_recovery_overseer() can sleep while holding a spinlock, which triggers a BUG warning. It is easily fixed by dropping the spinlock before waiting. In the case where the task waits, no useful information that could be protected by the spinlock is held, so nothing can be lost by dropping it. Lustre-change: https://review.whamcloud.com/#/c/37063/ Lustre-commit: b29b931 Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I8bb3d02523b5dcfadac19f01ccb736d7b7f28239 Reviewed-on: https://review.whamcloud.com/37063 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Mike Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-on: https://review.whamcloud.com/39283
LU-12424 lnet: prevent loop in LNetPrimaryNID()
If discovery is disabled locally or at the remote end, then attempt discovery only once. Do not update the internal database when discovery is disabled and do not repeat discovery. This change prevents LNet from getting hung waiting for discovery to complete. Signed-off-by: Amir Shehata <ashehata@whamcloud.com> Change-Id: I4543b0f71e6cf297a1a5f058ebcc6bf74b8ac328 Reviewed-on: https://review.whamcloud.com/35191 Reviewed-by: Olaf Weber <olaf.weber@hpe.com> Tested-by: Jenkins Reviewed-by: Chris Horn <hornc@cray.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-on: https://review.whamcloud.com/38890 Reviewed-by: Chris Horn <chris.horn@hpe.com> Tested-by: jenkins <devops@whamcloud.com>
LU-12222 lnet: Introduce constant for the lolnd NID
This patch adds a new constant, LNET_NID_LO_0, to represent the lolnd NID 0@lo. Lustre-change: https://review.whamcloud.com/38312 Lustre-commit: 56203e4 HPE-bug-id: LUS-8457 Signed-off-by: Chris Horn <hornc@cray.com> Change-Id: I3e57637f297b8de306905a447af8f025e31d1fcf Reviewed-on: https://review.whamcloud.com/38863 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-12222 lnet: Primary NID of lolnd NID is the lolnd NID
We want Lustre traffic that is intended for the local peer to be sent and received over the lolnd. The function ptlrpc_uuid_to_peer() will currently resolve a NID to the lolnd NID, but ptlrpc_connection_get() will overwrite this selection with the result from LNetPrimaryNID(). Have LNetPrimaryNID return the lolnd NID when it is passed the lolnd NID. Lustre-change: https://review.whamcloud.com/38313 Lustre-commit: 33d2e44 HPE-bug-id: LUS-8457 Signed-off-by: Chris Horn <hornc@cray.com> Change-Id: I02708bb45f8440091782ca7886bac7656efb0223 Reviewed-on: https://review.whamcloud.com/38864 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-12222 ptlrpc: Check if NID is local, not just lolnd NID
There's a couple places where we check whether a NID is the lolnd NID but we really want to know whether the NID is local. Use LNetIsPeerLocal() to accomplish this. Lustre-change: https://review.whamcloud.com/38388 Lustre-commit: 95bcc24 Signed-off-by: Chris Horn <hornc@cray.com> Change-Id: Ia17b9b4b54fd1063c42a6f8bdd0e593be1086683 Reviewed-on: https://review.whamcloud.com/38865 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-9971 lnet: use after free in lnet_discover_peer_locked()
When the lnet_net_lock is unlocked, the peer attached to an lnet_peer_ni (found via lnet_peer_ni::lpni_peer_net->lpn_peer) can change, and the old peer deallocated. If we are really unlucky, then all the churn could give us a new, different, peer at the same address in memory. Change the reference counting on the lnet_peer lp so that it is guaranteed to be alive when we relock the lnet_net_lock for the cpt. When the reference count is dropped lp may go away if it was unlinked, but the new peer is guaranteed to have a different address, so we can still correctly determine whether the peer changed and discovery should be redone. LU-9971 lnet: fix peer ref counting Exit from the loop after peer ref count has been incremented to avoid wrong ref count. The code makes sure that a peer is queued for discovery at most once if discovery is disabled. This is done to use discovery as a standard ping for gateways which do not have discovery feature or discovery is disabled. Signed-off-by: Olaf Weber <olaf.weber@hpe.com> Change-Id: Ia44dce20074b27ec0e77d7c1908c6a44ec73d326 Reviewed-on: https://review.whamcloud.com/28944 Reviewed-by: Amir Shehata <ashehata@whamcloud.com> Tested-by: Jenkins Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <uja.ornl@yahoo.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-on: https://review.whamcloud.com/38891 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: James Simmons <jsimmons@infradead.org>
LU-13278 lnet: Reconcile discovery push and reply handling
Reconcile the logic for updating the multi-rail flag of a peer when processing a discovery PUSH with the logic used when processing a discovery REPLY. Cray-bug-id: LUS-8516 Signed-off-by: Chris Horn <hornc@cray.com> Change-Id: Idfb4c3729822d03b71f9440ac66176ae6b886022 Reviewed-on: https://review.whamcloud.com/37674 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Stephen Champion <stephen.champion@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-on: https://review.whamcloud.com/39575 Reviewed-by: Chris Horn <chris.horn@hpe.com>
LU-13763 osc: don't allow negative grants
Add check in the osc_init_grant() to prevent possible underflow of cl_avail_grant and report error if it happens Cherry-picked-from: c84fdeb5469bdc507caf1b4d5b876fce47cf5d86 Cherry-picked-from-change: https://review.whamcloud.com/#/c/39380 Cherry-picked-from-patch: 2 Cherry-picked-from-branch: b2_12 Cherry-picked-from-status: Maloo +1, Review +1 Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: Idcd25ed427c23735e1cdc70359bace43b5b9d886
LU-12687 osc: consume grants for direct I/O
New IO engine implementation lost consuming grants by direct I/O writes. That led to early emergence of out of space condition during direct I/O. The below illustrates the problem: # OSTSIZE=100000 sh llmount.sh # dd if=/dev/zero of=/mnt/lustre/file bs=4k count=100 oflag=direct dd: error writing ‘/mnt/lustre/file’: No space left on device Consume grants for direct I/O. Try to consume grants in osc_queue_sync_pages() when it is called for pages which are being writted in direct i/o. Tests are added to verify grant consumption in buffered and direct i/o and to verify direct i/o overwrite when ost is full. The overwrite test is for ldiskfs only as zfs is unable to overwrite when it is full. Cherry-picked-from: 8dd02c50f7df6fd0af9ead75d5d7774f32c211e2 Cherry-picked-from-change: https://review.whamcloud.com/#/c/39386 Cherry-picked-from-patch: 10 Cherry-picked-from-branch: b2_12 Cherry-picked-from-status: Maloo +1, Review +1 Lustre-change: https://review.whamcloud.com/35896 Lustre-commit: 05f326a Fixes: 9fe4b52 ("LU-1030 osc: new IO engine implementation") Signed-off-by: Vladimir Saveliev <c17830@cray.com> Change-Id: I9a199452c564e8e8ad02f79231e8481166f3666e Cray-bug-id: LUS-7036
LU-13089 osc: revert "glimpse - search for active lock"
Revert "LU-11670 osc: glimpse - search for active lock" This could cause assertion failures like below: LustreError: 13759:0:(ldlm_lock.c:213:ldlm_lock_put()) ASSERTION((((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: LustreError: 10188:0:(ldlm_lock.c:205:ldlm_lock_put()) ASSERTION( atomic_read(&lock->l_refc) > 0 ) failed: LustreError: 10188:0:(ldlm_lock.c:205:ldlm_lock_put()) LBUG A glimpse cb race with cancel cb. This reverts commit 2548cb9 Conflicts: lustre/tests/sanityn.sh Cherry-picked-from: 37205c60bb2d99363a0c9dbf29d8f4fd684b6fab Cherry-picked-from-change: https://review.whamcloud.com/#/c/39819 Cherry-picked-from-patch: 1 Cherry-picked-from-branch: b2_12 Cherry-picked-from-status: Maloo +1, No Review Cherry-picked-from-notes: Removed test_103 Signed-off-by: Bobi Jam <bobijam@whamcloud.com> Change-Id: I12063d0b3f1411e0d44393823a3e220cea6567d5
Revert "LU-12687 osc: consume grants for direct I/O"
This reverts commit 2fcd7cf. Upstream has seen intermittent failures in testing on b2_12, consistent with the test failures seen on this patch. Reverting while the failures are investigated. Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
LU-12687 osc: consume grants for direct I/O
New IO engine implementation lost consuming grants by direct I/O writes. That led to early emergence of out of space condition during direct I/O. The below illustrates the problem: # OSTSIZE=100000 sh llmount.sh # dd if=/dev/zero of=/mnt/lustre/file bs=4k count=100 oflag=direct dd: error writing ‘/mnt/lustre/file’: No space left on device Consume grants for direct I/O. Try to consume grants in osc_queue_sync_pages() when it is called for pages which are being writted in direct i/o. Tests are added to verify grant consumption in buffered and direct i/o and to verify direct i/o overwrite when ost is full. The overwrite test is for ldiskfs only as zfs is unable to overwrite when it is full. Cherry-picked-from: 8dd02c50f7df6fd0af9ead75d5d7774f32c211e2 Cherry-picked-from-change: https://review.whamcloud.com/#/c/39386 Cherry-picked-from-patch: 10 Cherry-picked-from-branch: b2_12 Cherry-picked-from-status: Maloo +1, Review +1 Lustre-change: https://review.whamcloud.com/35896 Lustre-commit: 05f326a Fixes: 9fe4b52 ("LU-1030 osc: new IO engine implementation") Signed-off-by: Vladimir Saveliev <c17830@cray.com> Change-Id: I9a199452c564e8e8ad02f79231e8481166f3666e Cray-bug-id: LUS-7036
LU-13653 mdt: ignore quota when creating slave stripe
When creating striped directory, the quota limit has been checked on master MDT, the quota should be ignored when creating the slave stripe object. Cherry-picked-from: 475f891 Cherry-picked-from-change: https://review.whamcloud.com/39282 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/#/c/38875/ Lustre-commit: f762ace Change-Id: Ia53b1975a8d66c78725feb313659f7a9b889e735 Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
LU-12758 quota: clear default flag for new ID
When setting the quota limits as 0 by "lfs setquota", the default flag won't be cleared if the lquota_entry is just created for some quota ID at the first time because the quota limits are the same. Cherry-picked-from: hash 07aa659 Cherry-picked-from-change: https://review.whamcloud.com/38808 Cherry-picked-from-branch: b2_12 This patch is back-ported from the following one: Lustre-commit: ce86e23 Lustre-change: https://review.whamcloud.com/36236 Change-Id: I7f44ce0cb13783ca5bede2f55cd0707f1ccbc8ca Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com> Reviewed-by: Wang Shilong <wshilong@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13742 llite: do not bypass selinux xattr handling
Without the hint from selinux_is_enabled() to determine if selinux is running at boot the performance fix from LU-549 to skip handling of selinux xattrs cannot be correctly handled. The correct path is to act is if selinux is enabled. This fixes a bug introduced by LU-12355 that now exists in RHEL 8.2 kernels where clients have enabled selinux. Cherry-picked-from: d657c96 Cherry-picked-from-change: https://review.whamcloud.com/39671 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/39569 Lustre-commit: 994287b Fixes: 39e5bfa ("LU-12355 llite: include file linux/selinux.h removed") Test-Parameters: clientdistro=el8.2 serverdistro=el8.2 clientselinux testlist=sanity-selinux Test-Parameters: clientdistro=el8.1 serverdistro=el8.1 clientselinux testlist=sanity-selinux Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I6fb5ed9ecdb79545225b5586b90509eb157a355b Reviewed-by: Sebastien Buisson <sbuisson@ddn.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Signed-off-by: Minh Diep <mdiep@whamcloud.com>
LU-13761 o2ib: Fix compilation with MOFED 5.1
A new argument was added to rdma_reject() in MOFED 5.1 and Linux 5.8. Add a cofigure check and support both versions of rdma_reject(). Cherry-picked-from: ba702c7 Cherry-picked-from-change: https://review.whamcloud.com/39781 Cherry-picked-from-branch: b2_12 Lustre-commit: 956deb0 Lustre-change: https://review.whamcloud.com/39323 Test-Parameters: trivial Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com> Change-Id: I2b28991f335658b651b21a09899b7b17ab2a9d57 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13187 osd-ldiskfs: don't enforce max dir size limit on IAM objects
Add ext4-no-max-dir-size-limit-for-iam-objects.patch to introduce new inode state EXT4_STATE_IAM and use it to mark IAM objects. Cherry-picked-from: a73f4e5 Cherry-picked-from-change: https://review.whamcloud.com/39882 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/39823 Lustre-commit: 03e6db5 Change-Id: I3bcc5435ea07edb9fa265dcd8e3261d849495f00 Signed-off-by: Li Dongyang <dongyangli@ddn.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13907 llite: don't set FS_REQUIRES_DEV on client
If doing a client-only build, do not set the FS_REQUIRES_DEV flag for the 'lustre' filesystem type. This is only needed on the server, but the filesystem type declaration is shared between both. In master, this was fixed by declaring a new 'lustre_tgt' filesystem type and using that for server filesystem mounts. However, for 2.12 this is overkill, and it is possible to get a 95% fix by dropping the FS_REQUIRES_DEV flag for the common case of client-only builds. Cherry-picked-from: 76531b7 Cherry-picked-from-change: https://review.whamcloud.com/39674 Cherry-picked-from-branch: b2_12 Test-Parameters: trivial Signed-off-by: Andreas Dilger <adilger@dilger.ca> Change-Id: Iab2e78515aba018e2a6bceb324ad1b8a313ebbe5 Reviewed-by: Jian Yu <yujian@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13471 lnet: use the same src nid for discovery
When discovering a remote peer (not on the same network) a GET is sent to the peer to retrieve the peer's interfaces. This is followed by a PUSH, if discovery is on, to push the node's interfaces However, if both node and peer have multiple interfaces it is likely that the GET and the PUSH will originate on different interfaces. When the peer receives the PUSH it will not be able to connect the two NIDs and will not be able to consolidate the node's NIDs. This issue is specific for remote peers because at the time the push handler is invoked the remote lpni has not been created yet. lnet_parse() creates the lpni of the gateway. Similar to the strategy already in place of using the same source NID for all the messages of an RPC, discovery should use the same source NID for both the GET and PUSH. This patch stores the source NID interfaces the GET was sent on and uses it for the PUSH. Cherry-picked-from: b4b93d2 LU-13471 lnet: use the same src nid for discovery Cherry-picked-from-change: https://review.whamcloud.com/39576 Cherry-picked-from-branch: b2_12 Signed-off-by: Amir Shehata <ashehata@whamcloud.com> Change-Id: I5a13ab7799b2ddc47714202bcbed786b0d3940b7 Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13437 lmv: check stripe FID sanity
Striped directory layout may be broken, if some stripe FID is insane, return -ENODEV. Cherry-picked-from: f1712b3 Cherry-picked-from-change: https://review.whamcloud.com/39600 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/38560 Lustre-commit: 698a496 Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: I7ed8c7c561e34625e2cb29bfd14bc0ecf3fce46c Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com> Signed-off-by: Minh Diep <mdiep@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13437 mdt: don't fetch LOOKUP lock for remote object
Pack parent FID in getattr by FID, which will be used to check whether child is remote object on parent. The helper function is called mdt_is_remote_object(). NB, directory shard is not treated as remote object, because if so, client needs to revalidate shards when dir is accessed, which will hurt performance much. For getattr by FID, if object is remote file on parent, don't fetch LOOKUP lock, otherwise client may see stale dir entries. Cherry-picked-from: ae9fc81 Cherry-picked-from-change: https://review.whamcloud.com/39769 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/38561 Lustre-commit: f9a2da6 Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Yingjin Qian <qian@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Change-Id: I37b36983735eca63da37f190456b5cc1b861b29e
LU-13437 mdt: rename misses remote LOOKUP lock revoke
In rename, all objects but target may be remote, so to check whether source is remote object on source parent, we need to compare which MDTs they are located if both are remote. Add a helper function mdt_rename_source_lock() to handle all possible combinations. If target parent is remote, take remote LOOKUP for target on where target parent is. Add sanityn.sh 81c. Cherry-picked-from: 23fa920 Cherry-picked-from-change: https://review.whamcloud.com/39601 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/38181 Lustre-commit: 4918fe4 Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: I2c134970d6abc8761528d01950b23495292cdf93 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mike Pershin <mpershin@whamcloud.com> Signed-off-by: Minh Diep <mdiep@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13437 uapi: add OBD_CONNECT2_GETATTR_PFID
Add OBD_CONNECT2_GETATTR_PFID connect flag to pack parent FID in getattr request, which will be used to check whether target is remote object, if so, don't take LOOKUP lock, otherwise client may see stale directory entries. Cherry-picked-from: daa9148 Cherry-picked-from-change: https://review.whamcloud.com/39770 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/39289 Lustre-commit: f384a87 Test-parameters: trivial Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Oleg Drokin <green@whamcloud.com> Change-Id: Ibdf880934456f255f83cd4bac9d61ab5e1ed7330
LU-13437 llite: pack parent FID in getattr
Pack parent FID in getattr request if OBD_CONNECT2_GETATTR_PFID is enabled, otherwise fill it with target FID for backward compatibility. Cherry-picked-from: 3314727 Cherry-picked-from-change: https://review.whamcloud.com/39771 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/39290 Lustre-commit: 5f2c44b Fixes: f9a2da6 ("LU-13437 mdt: don't fetch LOOKUP lock for remot...") Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Change-Id: I91bace23e67b548feb92fd885fb5e64e92c96408
LU-13608 tgt: abort recovery while reading update llog
Abort the reading update LLOG fromt other MDTs when the recovery is aborted, then the recovery process can be aborted in time. This patch also adds watchdog for the process of the replay request to detect possible stale process. Cherry-picked-from: 4142f05 Cherry-picked-from-change: https://review.whamcloud.com/39284 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/38746 Lustre-commit: 0496cdf Change-Id: Ie2de041360c9eba95ef9bfd14b00ac2709e6eace Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-12820 osc: remove 'transient' arg from osc_enter_cache_try
This arg is always '0', so remove it. Consequently, OBD_BRW_NOCACHE is never set, and cl_dirty_transit and obd_dirty_transit_pages are never non-zero, so they can be removed as well. Cherry-picked-from: f92c7a1 Cherry-picked-from-change: https://review.whamcloud.com/39518 Cherry-picked-from-branch: b2_12 Lustre-change: https://review.whamcloud.com/36319 Lustre-commit: 524deb6 Patch also includes changes for atomic ops optimization to keep in sync with master branch: Lustre-change: https://review.whamcloud.com/33859 Lustre-commit: 8b364fb Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: Ia047affc33fb9277e6c28a8f6d7d088c385b51a8 Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13590 kernel: new kernel [RHEL 7.9 3.10.0-1160.2.1.el7]
This patch makes changes to support new RHEL 7.9 release for Lustre client. Cherry-picked-by: Olaf Faaland <faaland1@llnl.gov> Cherry-picked-from: 4f65883699f57b9293a4d21f475f96797ce2a757 Cherry-picked-from-change: https://review.whamcloud.com/#/c/40177/1 Cherry-picked-from-reviews: +1 Maloo, +1 Jenkins, +2 Code-Reviews Cherry-picked-from-branch: b2_12 Test-Parameters: trivial clientdistro=el7.9 Change-Id: I7a2846de48a6710d6d720d6ccc3176dba4afc6bb Signed-off-by: Jian Yu <yujian@whamcloud.com>
LU-13590 kernel: RHEL 7.9 server support
This patch makes changes to support new RHEL 7.9 release for Lustre server (kernel 3.10.0-1160.2.1.el7). Cherry-picked-by: Olaf Faaland <faaland1@llnl.gov> Cherry-picked-from: 66703259f56edac4eaffde53a4363fcc90dcee79 Cherry-picked-from-change: https://review.whamcloud.com/#/c/40224/1/ Cherry-picked-from-reviews: +1 Maloo, +1 Jenkins Cherry-picked-from-branch: b2_12 Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9 Change-Id: I7653091f2bd6a579447edb12045984d2829a8235 Signed-off-by: Jian Yu <yujian@whamcloud.com>
LU-13892 lnet: lock-up during router check
This is a fix for the issue with LNet lock-up while waiting for routers to become active with check_routers_before_use option. Release ln_api_mutex while waiting to allow incoming connections to be handled. Cherry-picked-from: 877d95b Cherry-picked-from-branch: b2_12 Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: I63b1d1ce5ee2b27a3bd2cea78713fc6fc7502cf7 Reviewed-on: https://review.whamcloud.com/40172 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Amir Shehata <ashehata@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>