Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial hack at kernel build system support for Rust #3

Closed
wants to merge 1 commit into from

Conversation

alex
Copy link
Owner

@alex alex commented Aug 22, 2019

No description provided.

alex pushed a commit that referenced this pull request Aug 22, 2019
A deadlock with this stacktrace was observed.

The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio
shrinker and the shrinker depends on I/O completion in the dm-bufio
subsystem.

In order to fix the deadlock (and other similar ones), we set the flag
PF_MEMALLOC_NOIO at loop thread entry.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   torvalds#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   torvalds#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   torvalds#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   torvalds#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  torvalds#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  torvalds#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  torvalds#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  torvalds#13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  torvalds#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

  PID: 14127  TASK: ffff881455749c00  CPU: 11  COMMAND: "loop1"
   #0 [ffff88272f5af228] __schedule at ffffffff8173f405
   #1 [ffff88272f5af280] schedule at ffffffff8173fa27
   #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
   #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
   #4 [ffff88272f5af330] mutex_lock at ffffffff81742133
   #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
   torvalds#6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
   torvalds#7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
   torvalds#8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
   torvalds#9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
  torvalds#10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
  torvalds#11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
  torvalds#12 [ffff88272f5af760] new_slab at ffffffff811f4523
  torvalds#13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
  torvalds#14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
  torvalds#15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
  torvalds#16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
  torvalds#17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
  torvalds#18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
  torvalds#19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
  torvalds#20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
  torvalds#21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
  torvalds#22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
  torvalds#23 [ffff88272f5afec0] kthread at ffffffff810a8428
  torvalds#24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
alex pushed a commit that referenced this pull request Aug 22, 2019
…OL_MF_STRICT were specified

When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should
try best to migrate misplaced pages, if some of the pages could not be
migrated, then return -EIO.

There are three different sub-cases:
 1. vma is not migratable
 2. vma is migratable, but there are unmovable pages
 3. vma is migratable, pages are movable, but migrate_pages() fails

If #1 happens, kernel would just abort immediately, then return -EIO,
after a7f40cf ("mm: mempolicy: make mbind() return -EIO when
MPOL_MF_STRICT is specified").

If #3 happens, kernel would set policy and migrate pages with
best-effort, but won't rollback the migrated pages and reset the policy
back.

Before that commit, they behaves in the same way.  It'd better to keep
their behavior consistent.  But, rolling back the migrated pages and
resetting the policy back sounds not feasible, so just make #1 behave as
same as #3.

Userspace will know that not everything was successfully migrated (via
-EIO), and can take whatever steps it deems necessary - attempt
rollback, determine which exact page(s) are violating the policy, etc.

Make queue_pages_range() return 1 to indicate there are unmovable pages
or vma is not migratable.

The #2 is not handled correctly in the current kernel, the following
patch will fix it.

[yang.shi@linux.alibaba.com: fix review comments from Vlastimil]
  Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
@alex alex force-pushed the module-make-rust branch 2 times, most recently from 5539004 to bf6e269 Compare August 22, 2019 02:43
# TODO: release/debug
$(obj)/%.rust.a: $(src)/Cargo.toml $(wildcard $(src)/src/*.rs) $(srctree)/arc/$(SRCARCH)/$(ARCH)-kernel-target.json
cd $(src); env -u MAKE -u MAKEFLAGS cargo xbuild --target=$(srctree)/arc/$(SRCARCH)/$(ARCH)-kernel-target.json
cp $(src)/target/x86_64-linux-kernel-module/debug/lib%.a $(obj)/%.rust.a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(obj)/%.rust.a can probably be $@

@alex
Copy link
Owner Author

alex commented Aug 28, 2019

ok, attempting to build using this the kernel in this PR is not being successful. built with make defconfig + make menuconfig and enabling Rust.

alexgaynor@penguin ~/p/l/hello-world> cat Makefile 
obj-m := hello_world.rust.o
KDIR ?= /lib/modules/$(shell uname -r)/build

all:
	$(MAKE) -C $(KDIR) M=$(CURDIR)

clean:
	$(MAKE) -C $(KDIR) M=$(CURDIR) clean
alexgaynor@penguin ~/p/l/hello-world> env KDIR=/home/alexgaynor/projects/linux/ make
make -C /home/alexgaynor/projects/linux/ M=/home/alexgaynor/projects/linux-kernel-module-rust/hello-world
make[1]: Entering directory '/home/alexgaynor/projects/linux'
make[2]: *** No rule to make target '/home/alexgaynor/projects/linux-kernel-module-rust/hello-world/hello_world.rust.o', needed by '__build'.  Stop.
Makefile:1624: recipe for target '_module_/home/alexgaynor/projects/linux-kernel-module-rust/hello-world' failed
make[1]: *** [_module_/home/alexgaynor/projects/linux-kernel-module-rust/hello-world] Error 2
make[1]: Leaving directory '/home/alexgaynor/projects/linux'
Makefile:5: recipe for target 'all' failed
make: *** [all] Error 2

init/Kconfig Outdated
def_bool $(success,cargo --version)

config HAS_CARGO_XBUILD
def_bool $(success,cargo xbuild --version)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo xbuild requires nightly Rust. I would suggest that we get the target upstreamed into the Rust project and into stable Rust, which will make it much more palatable to the Linux kernel.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is a hard requirement for Rust support moving into mainline? I want to make sure I have my yak stack^W^W dependency chain right :-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we do have optional things in the kernel that require third-party tools and we also preemptively add support for upcoming GCC (stable) releases, this is about adding support for a new language which will likely require treewide changes to do properly. Therefore, yes, it would help a lot to show that the proposed changes are reasonably stable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a hard requirement, but you're going to have enough interesting arguments getting this upstream that if we can forestall a couple of them we should. :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compelling enough for me! Since I know you're also involved in upstream Rust work, do you have an opinion on process for this? Does it need an RFC, or is a PR enough?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A PR should suffice. CC me and I'll review it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome -- fine to start with just x86_64?

init/Kconfig Outdated
depends on HAS_CARGO
depends on HAS_CARGO_XBUILD
help
Whether to support building modules written in Rust.
Copy link

@joshtriplett joshtriplett Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though you autodetect the availability of Rust and Cargo, for the moment at least the request from upstream was to make sure this wasn't enabled by make allyesconfig or make allmodconfig. (For instance, just because a developer has rust and cargo installed for other reasons, doesn't necessarily mean they want their kernel tests to build this for now.)

To avoid that, I'd suggest the following pattern, inspired by a similar approach and requirement for link-time optimization (LTO). Rename config RUST to config RUST_MENU, and then add the following below it:

config RUST_DISABLE
	bool
	depends on RUST_MENU
	help
	  This option disables the support for Rust, so that make 
	  allyesconfig and make allmodconfig will not enable it. To 
	  build modules written in Rust, leave this option set to 'n'.

config RUST
	bool
	default y
	depends on RUST_MENU && !RUST_DISABLE

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I wasn't sure what the right pattern was here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just using "depends on !COMPILE_TEST" is sufficient.

@joshtriplett
Copy link

I made a couple of comments on items in the code, but I think they're hidden because I commented on a previous version. Links:

bf6e269#r319355179

bf6e269#r319356592

@joshtriplett
Copy link

joshtriplett commented Aug 31, 2019 via email

@alex
Copy link
Owner Author

alex commented Aug 31, 2019

Fantastic, thanks much! Hopefully will have a PR up this long weekend!

@alex
Copy link
Owner Author

alex commented Sep 6, 2019

Just for context, I've paused on this to work on:

  • Getting the kernel target into upstream rust
  • Updating linux-kernel-module-rust to use that target
  • Rebase this work on that

@alex
Copy link
Owner Author

alex commented Sep 12, 2019

Ok, this is now working (requires some changes I haven't landed yet to the rust lib side)! There's still some TODO comments though.

The next step is going to be to do fishinabarrel/linux-kernel-module-rust#177, which should make this work with no patches required on that side.

Once that happens I'll come back to do these TODOs and do other polish work here.

@@ -1927,14 +1927,27 @@ config HAS_CARGO
config HAS_CARGO_XBUILD
def_bool $(success,cargo xbuild --version)

config RUST
config MENU_RUST
bool "Enables building kernel modules written in Rust"
depends on HAS_RUST
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we only need to know if "cargo xbuild" works, and that sufficient for "HAS_RUST" in the sense that there's nothing we can do with only the existing HAS_RUST nor HAS_CARGO. Only HAS_CARGO_XBUILD is meaningful (it requires rustc and cargo), and the kernel needs full 'cargo xbuild' support.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo xbuild isn't even required anymore -- as of a week or two ago, everything we need is in upstream cargo!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we need to check for any of this. If Rust modules get into mainline, we will build them as any other, so we will assume we have rustc etc. available no matter what (and we will want to be able to change it like we do with CC). And if there are no modules, we don't care anyway.

config RUST
bool
default y
depends on RUST_MENU && !RUST_DISABLE
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of this is needed -- just leave it "config RUST" with a depends on HAS_RUST that does the "cargo xbuild" test. An "allmodconfig" will not include things that require RUST if HAS_RUST isn't set, etc.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying the whole RUST_DISABLE should be removed entirely, or just that this particular line can be simplified?

alex pushed a commit that referenced this pull request Sep 23, 2019
Try to print out startup pgm check info including exact linux kernel
version, pgm interruption code and ilc, psw and general registers. Like
the following:

Linux version 5.3.0-rc7-07282-ge7b4d41d61bd-dirty (gor@tuxmaker) #3 SMP PREEMPT Thu Sep 5 16:07:34 CEST 2019
Kernel fault: interruption code 0005 ilc:2
PSW : 0000000180000000 0000000000012e52
      R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 RI:0 EA:3
GPRS: 0000000000000000 00ffffffffffffff 0000000000000000 0000000000019a58
      000000000000bf68 0000000000000000 0000000000000000 0000000000000000
      0000000000000000 0000000000000000 000000000001a041 0000000000000000
      0000000004c9c000 0000000000010070 0000000000012e42 000000000000beb0

This info makes it apparent that kernel startup failed and might help
to understand what went wrong without actual standalone dump.

Printing code runs on its own stack of 1 page (at unused 0x5000), which
should be sufficient for sclp_early_printk usage (typical stack usage
observed has been around 512 bytes).

The code has pgm check recursion prevention, despite pgm check info
printing failure (follow on pgm check) or success it restores original
faulty psw and gprs and does disabled wait.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
alex pushed a commit that referenced this pull request Sep 23, 2019
…-ports-shared-buffer'

Ido Schimmel says:

====================
mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer

Shalom says:

While debugging packet loss towards the CPU, it is useful to be able to
query the CPU port's shared buffer quotas and occupancy.

Patch #1 prevents changing the CPU port's threshold and binding.

Patch #2 registers the CPU port with devlink.

Patch #3 adds the ability to query the CPU port's shared buffer quotas and
occupancy.

v3:

Patch #2:
* Remove unnecessary wrapping

v2:

Patch #1:
* s/0/MLXSW_PORT_CPU_PORT/
* Assign "mlxsw_sp->ports[MLXSW_PORT_CPU_PORT]" at the end of
  mlxsw_sp_cpu_port_create() to avoid NULL assignment on error path
* Add common functions for mlxsw_core_port_init/fini()

Patch #2:
* Move "changing CPU port's threshold and binding" check to a separate
  patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
@alex alex changed the base branch from master to bsd-urandom-device September 23, 2019 00:13
@alex alex changed the base branch from bsd-urandom-device to master September 23, 2019 00:13
@alex
Copy link
Owner Author

alex commented Sep 23, 2019

Ok. I've gotten most of the TODOs. One of the major remaining ones is release vs. debug builds.

Does anyone have an opinion on a) which should be the default, b) what the right way to control which you're getting is?

@joshtriplett
Copy link

joshtriplett commented Sep 23, 2019 via email

@@ -283,6 +283,19 @@ quiet_cmd_cc_lst_c = MKLST $@
$(obj)/%.lst: $(src)/%.c FORCE
$(call if_changed_dep,cc_lst_c)

# Compile Rust sources
# --------------------
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the same length as the other sections for the --- line.


# TODO: release/debug
$(obj)/%.rust.o: $(src)/Cargo.toml $(src)/Cargo.lock $(wildcard $(src)/src/*.rs) FORCE
cd $(src); env -u MAKE -u MAKEFLAGS KDIR="$(CURDIR)/$(srctree)" $(CARGO) build -Z build-std=core,alloc --target=$(CONFIG_ARCH_RUST_TARGET)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split long lines with backslashes.

@joshtriplett
Copy link

joshtriplett commented Sep 23, 2019 via email

@ojeda
Copy link

ojeda commented Sep 23, 2019

This was the specific requirement for merging upstream. People running "make allyesconfig" or "make allmodconfig" don't want to have to have Rust installed. (Analogously, LTO required a newer toolchain and a lot of memory, and people running "make allyesconfig" or "make allmodconfig" didn't want to have to have that.)

I am not saying we require Rust, but rather that if no Rust module is built in all{yes,mod}config, we don't need to care whether $(RUSTC) etc. points somewhere valid or not.

Or, to put it another way: LTO and some other config options (like debug ones) are optional, because we can either use them or not; however, Rust is always mandatory for those modules that are written in it.

@joshtriplett
Copy link

@ojeda Ah, sorry, I see what you're saying. You're not talking about the config structure with a re-disable option so that it gets disabled on allyesconfig; you're talking about the search for tools?

I do agree that you could just unconditionally use the tools, rather than detecting them, if they're enabled.

The "re-disable" mechanism is still needed, though, so that if someone has rustc installed it still isn't used in the kernel build without specifically being enabled.

@ojeda
Copy link

ojeda commented Sep 23, 2019

@joshtriplett No need to be sorry! Yeah, I was referring to the detection (rustc --version).

For the re-disabling mechanism, I guess we cannot do anything else if we want to start including some Rust code in mainline (i.e. not just the build mechanism).

alex pushed a commit that referenced this pull request Sep 28, 2019
Observe a segmentation fault when 'perf stat' is asked to repeat forever
with the interval option.

Without fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
  #           time             counts unit events
       5.000211692  3,13,89,82,34,157      cycles
      10.000380119  1,53,98,52,22,294      cycles
      10.040467280       17,16,79,265      cycles
  Segmentation fault

This problem was only observed when we use forever option aka -r 0 and
works with limited repeats. Calling print_counter with ts being set to
NULL, is not a correct option when interval is set. Hence avoid
print_counter(NULL,..)  if interval is set.

With fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
   #           time             counts unit events
       5.019866622  3,15,14,43,08,697      cycles
      10.039865756  3,15,16,31,95,261      cycles
      10.059950628     1,26,05,47,158      cycles
       5.009902655  3,14,52,62,33,932      cycles
      10.019880228  3,14,52,22,89,154      cycles
      10.030543876       66,90,18,333      cycles
       5.009848281  3,14,51,98,25,437      cycles
      10.029854402  3,15,14,93,04,918      cycles
       5.009834177  3,14,51,95,92,316      cycles

Committer notes:

Did the 'git bisect' to find the cset introducing the problem to add the
Fixes tag below, and at that time the problem reproduced as:

  (gdb) run stat -r0 -I500 sleep 1
  <SNIP>
  Program received signal SIGSEGV, Segmentation fault.
  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  866		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
  (gdb) bt
  #0  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  #1  0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938
  #2  0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411
  #3  0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370
  #4  0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429
  #5  0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473
  torvalds#6  0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588
  (gdb)

Mostly the same as just before this patch:

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  964		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
  (gdb) bt
  #0  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  #1  0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670)
      at util/stat-display.c:1172
  #2  0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656
  #3  0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960
  #4  0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310
  #5  0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362
  torvalds#6  0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406
  torvalds#7  0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531
  (gdb)

Fixes: d4f63a4 ("perf stat: Introduce print_counters function")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
alex pushed a commit that referenced this pull request Sep 28, 2019
We release wrong pointer on error path in cpu_cache_level__read
function, leading to segfault:

  (gdb) r record ls
  Starting program: /root/perf/tools/perf/perf record ls
  ...
  [ perf record: Woken up 1 times to write data ]
  double free or corruption (out)

  Thread 1 "perf" received signal SIGABRT, Aborted.
  0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6
  (gdb) bt
  #0  0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6
  #1  0x00007ffff7443bac in abort () from /lib64/power9/libc.so.6
  #2  0x00007ffff74af8bc in __libc_message () from /lib64/power9/libc.so.6
  #3  0x00007ffff74b92b8 in malloc_printerr () from /lib64/power9/libc.so.6
  #4  0x00007ffff74bb874 in _int_free () from /lib64/power9/libc.so.6
  #5  0x0000000010271260 in __zfree (ptr=0x7fffffffa0b0) at ../../lib/zalloc..
  torvalds#6  0x0000000010139340 in cpu_cache_level__read (cache=0x7fffffffa090, cac..
  torvalds#7  0x0000000010143c90 in build_caches (cntp=0x7fffffffa118, size=<optimiz..
  ...

Releasing the proper pointer.

Fixes: 720e98b ("perf tools: Add perf data cache feature")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org: # v4.6+
Link: http://lore.kernel.org/lkml/20190912105235.10689-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
alex pushed a commit that referenced this pull request Sep 28, 2019
I'm seeing a bunch of debug prints from a user of print_hex_dump_bytes()
in my kernel logs, but I don't have CONFIG_DYNAMIC_DEBUG enabled nor do I
have DEBUG defined in my build.  The problem is that
print_hex_dump_bytes() calls a wrapper function in lib/hexdump.c that
calls print_hex_dump() with KERN_DEBUG level.  There are three cases to
consider here

  1. CONFIG_DYNAMIC_DEBUG=y  --> call dynamic_hex_dum()
  2. CONFIG_DYNAMIC_DEBUG=n && DEBUG --> call print_hex_dump()
  3. CONFIG_DYNAMIC_DEBUG=n && !DEBUG --> stub it out

Right now, that last case isn't detected and we still call
print_hex_dump() from the stub wrapper.

Let's make print_hex_dump_bytes() only call print_hex_dump_debug() so that
it works properly in all cases.

Case #1, print_hex_dump_debug() calls dynamic_hex_dump() and we get same
behavior.  Case #2, print_hex_dump_debug() calls print_hex_dump() with
KERN_DEBUG and we get the same behavior.  Case #3, print_hex_dump_debug()
is a nop, changing behavior to what we want, i.e.  print nothing.

Link: http://lkml.kernel.org/r/20190816235624.115280-1-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alex pushed a commit that referenced this pull request Oct 11, 2019
Ido Schimmel says:

====================
mlxsw: Various fixes

This patchset includes two small fixes for the mlxsw driver and one
patch which clarifies recently introduced devlink-trap documentation.

Patch #1 clears the port's VLAN filters during port initialization. This
ensures that the drop reason reported to the user is consistent. The
problem is explained in detail in the commit message.

Patch #2 clarifies the description of one of the traps exposed via
devlink-trap.

Patch #3 from Danielle forbids the installation of a tc filter with
multiple mirror actions since this is not supported by the device. The
failure is communicated to the user via extack.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
alex pushed a commit that referenced this pull request Oct 11, 2019
This patch fixes the lock inversion complaint:

============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]

but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&id_priv->handler_mutex);
  lock(&id_priv->handler_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/u16:6/171:
 #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
 dump_stack+0x8a/0xd6
 __lock_acquire.cold+0xe1/0x24d
 lock_acquire+0x106/0x240
 __mutex_lock+0x12e/0xcb0
 mutex_lock_nested+0x1f/0x30
 rdma_destroy_id+0x78/0x4a0 [rdma_cm]
 iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
 cm_work_handler+0xe62/0x1100 [iw_cm]
 process_one_work+0x56d/0xac0
 worker_thread+0x7a/0x5d0
 kthread+0x1bc/0x210
 ret_from_fork+0x24/0x30

This is not a bug as there are actually two lock classes here.

Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org
Fixes: de910bd ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
@alex alex closed this Mar 12, 2024
@alex alex deleted the module-make-rust branch March 12, 2024 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants