Wang-Jianchao/…
Commits on Jan 10, 2022
-
blk: introduce iostat per cgroup module
iostat can only track the whole device's io statistics. This patch introduces iostat per cgroup based on blk-rq-qos framework which can track bw, iops, queue latency and device latency and distinguish regular or meta data. The blkio.iostat per cgroup output in following format, vda-data bytes iops queue_lat dev_lat [ditto] [ditto] meta \___________ ______________/ | | v v v read write discard In particular, the blkio.iostat of root only output the statistics of IOs from root cgroup. However, the non-root blkio.iostat outputs all of the children cgroups. With meta stats in root cgroup, hope to observe the performace of fs metadata. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> -
blk: make request able to carry blkcg_gq
After blk_update_request, the bios can be gone. We cannot track the req in cgroup fashion in following IO completion path. This patch add blkcg_gq into request, get it when install bio, put it before request is released. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
blk: remove unused interfaces of blk-rq-qos
No functional changes here Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
blk-ioprio: make ioprio pluggable and modular
Make blk-ioprio pluggable and modular. Then we can close or open it through /sys/block/xxx/queue/qos and rmmod the module if we don't need it which can release one blkcg policy slot. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
blk: rename ioprio.c to ioprio-common.c
In next patch, blk-ioprio.c is changed to a module named ioprio.ko. Rename ioprio.c to ioprio-common.c to avoid same ioprio.o in Makefile Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
blk-iocost: make iocost pluggable and modular
Make blk-iocost pluggable and modular. Then we can close or open it through /sys/block/xxx/queue/qos and rmmod the module if we don't need it which can release one blkcg policy slot. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
blk: use standalone macro to control bio.bi_iocost_cost
This is a preparation to make iocost modular Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
blk: remove unused BLK_RQ_IO_DATA_LEN
Remove it as nobody use it any more. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
blk-iolatency: make iolatency pluggable and modular
Make blk-iolatency pluggable and modular. Then we can close or open it through /sys/block/xxx/queue/qos and rmmod the module if we don't need it which can release one blkcg policy slot. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
cgroup: export following two interfaces
This is a preparation for making blk-rq-qos modular, there is no functional change, but just export interfaces pr_cont_cgroup_path and cgroup_parse_float. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
blk: export following interfaces
This is a preparation for making blk-rq-qos policyies modular, there is no functional change. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
This patch makes wbt pluggable through /sys/block/xxx/queue/qos. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
-
blk: make blk-rq-qos support pluggable and modular policy
blk-rq-qos is a standalone framework out of io-sched and can be used to control or observe the IO progress in block-layer with hooks. blk-rq-qos is a great design but right now, it is totally fixed and built-in and shut out peoples who want to use it with external module. This patch make blk-rq-qos policies pluggable and modular. (1) Add code to maintain the rq_qos_ops. A rq-qos module need to register itself with rq_qos_register(). The original enum rq_qos_id will be removed in following patch. They will use a dynamic id maintained by rq_qos_ida. (2) Add .init callback into rq_qos_ops. We use it to initialize the resource. (3) Add /sys/block/x/queue/qos We can use '+name' or "-name" to open or close the blk-rq-qos policy. Because the rq-qos list can be modified at anytime, rq_qos_id() which has been renamed to rq_qos_by_id() has to iterate the list under sysfs_lock or queue_lock. This patch adapts the code for this. More details, please refer to the comment above rq_qos_get(), And the rq_qos_exit() is moved to blk_cleanup_queue. Except for these modification, there is no other functional change here. Following patches will adpat the code of wbt, iolatency, iocost and ioprio to make them pluggable and modular one by one. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Commits on Jan 7, 2022
-
-
cpuset: convert 'allowed' in __cpuset_node_allowed() to be boolean
Convert 'allowed' in __cpuset_node_allowed() to be boolean since the return types of node_isset() and __cpuset_node_allowed() are both boolean. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Commits on Jan 6, 2022
-
-
cgroup/rstat: check updated_next only for root
After commit dc26532 ("cgroup: rstat: punt root-level optimization to individual controllers"), each rstat on updated_children list has its ->updated_next not NULL. This means we can remove the check on ->updated_next, if we make sure the subtree from @root is on list, which could be done by checking updated_next for root. tj: Coding style fixes. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
cgroup: rstat: explicitly put loop variant in while
Instead of do while unconditionally, let's put the loop variant in while. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
-
selftests: cgroup: Test open-time cgroup namespace usage for migratio…
…n checks When a task is writing to an fd opened by a different task, the perm check should use the cgroup namespace of the latter task. Add a test for it. Tested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
selftests: cgroup: Test open-time credential usage for migration checks
When a task is writing to an fd opened by a different task, the perm check should use the credentials of the latter task. Add a test for it. Tested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
selftests: cgroup: Make cg_create() use 0755 for permission instead o…
…f 0644 0644 is an odd perm to create a cgroup which is a directory. Use the regular 0755 instead. This is necessary for euid switching test case. Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
cgroup: Use open-time cgroup namespace for process migration perm checks
cgroup process migration permission checks are performed at write time as whether a given operation is allowed or not is dependent on the content of the write - the PID. This currently uses current's cgroup namespace which is a potential security weakness as it may allow scenarios where a less privileged process tricks a more privileged one into writing into a fd that it created. This patch makes cgroup remember the cgroup namespace at the time of open and uses it for migration permission checks instad of current's. Note that this only applies to cgroup2 as cgroup1 doesn't have namespace support. This also fixes a use-after-free bug on cgroupns reported in https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com Note that backporting this fix also requires the preceding patch. Reported-by: "Eric W. Biederman" <ebiederm@xmission.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com Fixes: 5136f63 ("cgroup: implement "nsdelegate" mount option") Signed-off-by: Tejun Heo <tj@kernel.org>
-
cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
of->priv is currently used by each interface file implementation to store private information. This patch collects the current two private data usages into struct cgroup_file_ctx which is allocated and freed by the common path. This allows generic private data which applies to multiple files, which will be used to in the following patch. Note that cgroup_procs iterator is now embedded as procs.iter in the new cgroup_file_ctx so that it doesn't need to be allocated and freed separately. v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in cgroup_file_ctx as suggested by Linus. v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too. Converted. Didn't change to embedded allocation as cgroup1 pidlists get stored for caching. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Michal Koutný <mkoutny@suse.com> -
cgroup: Use open-time credentials for process migraton perm checks
cgroup process migration permission checks are performed at write time as whether a given operation is allowed or not is dependent on the content of the write - the PID. This currently uses current's credentials which is a potential security weakness as it may allow scenarios where a less privileged process tricks a more privileged one into writing into a fd that it created. This patch makes both cgroup2 and cgroup1 process migration interfaces to use the credentials saved at the time of open (file->f_cred) instead of current's. Reported-by: "Eric W. Biederman" <ebiederm@xmission.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Fixes: 187fe84 ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy") Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Commits on Jan 5, 2022
-
Merge tag 'net-5.16-final' of git://git.kernel.org/pub/scm/linux/kern…
…el/git/netdev/net Pull networking fixes from Jakub Kicinski" "Networking fixes, including fixes from bpf, and WiFi. One last pull request, turns out some of the recent fixes did more harm than good. Current release - regressions: - Revert "xsk: Do not sleep in poll() when need_wakeup set", made the problem worse - Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register", broke EPROBE_DEFER handling - Revert "net: usb: r8152: Add MAC pass-through support for more Lenovo Docks", broke setups without a Lenovo dock Current release - new code bugs: - selftests: set amt.sh executable Previous releases - regressions: - batman-adv: mcast: don't send link-local multicast to mcast routers Previous releases - always broken: - ipv4/ipv6: check attribute length for RTA_FLOW / RTA_GATEWAY - sctp: hold endpoint before calling cb in sctp_transport_lookup_process - mac80211: mesh: embed mesh_paths and mpp_paths into ieee80211_if_mesh to avoid complicated handling of sub-object allocation failures - seg6: fix traceroute in the presence of SRv6 - tipc: fix a kernel-infoleak in __tipc_sendmsg()" * tag 'net-5.16-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) selftests: set amt.sh executable Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks" sfc: The RX page_ring is optional iavf: Fix limit of total number of queues to active queues of VF i40e: Fix incorrect netdev's real number of RX/TX queues i40e: Fix for displaying message regarding NVM version i40e: fix use-after-free in i40e_sync_filters_subtask() i40e: Fix to not show opcode msg on unsuccessful VF MAC change ieee802154: atusb: fix uninit value in atusb_set_extended_addr mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh mac80211: initialize variable have_higher_than_11mbit sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc netrom: fix copying in user data in nr_setsockopt udp6: Use Segment Routing Header for dest address if present icmp: ICMPV6: Examine invoking packet for Segment Route Headers. seg6: export get_srh() for ICMP handling Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register" ipv6: Do cleanup if attribute validation fails in multipath route ipv6: Continue processing multipath route even if gateway attribute is invalid net/fsl: Remove leftover definition in xgmac_mdio ... -
selftests: set amt.sh executable
amt.sh test script will not work because it doesn't have execution permission. So, it adds execution permission. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: c08e8ba ("selftests: add amt interface selftest script") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220105144436.13415-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo …
…Docks" This reverts commit f77b83b. This change breaks multiple usb to ethernet dongles attached on Lenovo USB hub. Fixes: f77b83b ("net: usb: r8152: Add MAC passthrough support for more Lenovo Docks") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Link: https://lore.kernel.org/r/20220105155102.8557-1-aaron.ma@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aaron Ma authored and Jakub Kicinski committedJan 5, 2022 -
Merge tag 'gpio-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linu…
…x/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: "Here are two last fixes for this release cycle from the GPIO subsystem: - fix irq offset calculation in gpio-aspeed-sgpio - update the MAINTAINERS entry for gpio-brcmstb" * tag 'gpio-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: MAINTAINERS: update gpio-brcmstb maintainers gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler
-
Merge tag 'ieee802154-for-net-2022-01-05' of git://git.kernel.org/pub…
…/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2022-01-05 Below I have a last minute fix for the atusb driver. Pavel fixes a KASAN uninit report for the driver. This version is the minimal impact fix to ease backporting. A bigger rework of the driver to avoid potential similar problems is ongoing and will come through net-next when ready. * tag 'ieee802154-for-net-2022-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan: ieee802154: atusb: fix uninit value in atusb_set_extended_addr ==================== Link: https://lore.kernel.org/r/20220105153914.512305-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski committedJan 5, 2022 -
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git…
…/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-01-04 This series contains updates to i40e and iavf drivers. Mateusz adjusts displaying of failed VF MAC message when the failure is expected as well as modifying an NVM info message to not confuse the user for i40e. Di Zhu fixes a use-after-free issue MAC filters for i40e. Jedrzej fixes an issue with misreporting of Rx and Tx queues during reinitialization for i40e. Karen correct checking of channel queue configuration to occur against active queues for iavf. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
sfc: The RX page_ring is optional
The RX page_ring is an optional feature that improves performance. When allocation fails the driver can still function, but possibly with a lower bandwidth. Guard against dereferencing a NULL page_ring. Fixes: 2768935 ("sfc: reuse pages to avoid DMA mapping/unmapping costs") Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Reported-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Link: https://lore.kernel.org/r/164111288276.5798.10330502993729113868.stgit@palantir17.mph.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Habets authored and Jakub Kicinski committedJan 5, 2022
Commits on Jan 4, 2022
-
iavf: Fix limit of total number of queues to active queues of VF
In the absence of this validation, if the user requests to configure queues more than the enabled queues, it results in sending the requested number of queues to the kernel stack (due to the asynchronous nature of VF response), in which case the stack might pick a queue to transmit that is not enabled and result in Tx hang. Fix this bug by limiting the total number of queues allocated for VF to active queues of VF. Fixes: d5b33d0 ("i40evf: add ndo_setup_tc callback to i40evf") Signed-off-by: Ashwin Vijayavel <ashwin.vijayavel@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
i40e: Fix incorrect netdev's real number of RX/TX queues
There was a wrong queues representation in sysfs during driver's reinitialization in case of online cpus number is less than combined queues. It was caused by stopped NetworkManager, which is responsible for calling vsi_open function during driver's initialization. In specific situation (ex. 12 cpus online) there were 16 queues in /sys/class/net/<iface>/queues. In case of modifying queues with value higher, than number of online cpus, then it caused write errors and other errors. Add updating of sysfs's queues representation during driver initialization. Fixes: 41c445f ("i40e: main driver core") Signed-off-by: Lukasz Cieplicki <lukaszx.cieplicki@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
i40e: Fix for displaying message regarding NVM version
When loading the i40e driver, it prints a message like: 'The driver for the device detected a newer version of the NVM image v1.x than expected v1.y. Please install the most recent version of the network driver.' This is misleading as the driver is working as expected. Fix that by removing the second part of message and changing it from dev_info to dev_dbg. Fixes: 4fb29bd ("i40e: The driver now prints the API version in error message") Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>