Chandan-Rajend…
Commits on Apr 14, 2016
-
Btrfs: subpage-blocksize: __btrfs_lookup_bio_sums: Set offset when mo…
…ving to a new bio_vec In __btrfs_lookup_bio_sums() we set the file offset value at the beginning of every iteration of the while loop. This is incorrect since the blocks mapped by the current bvec->bv_page might not yet have been completely processed. This commit fixes the issue by setting the file offset value when we move to the next bvec of the bio. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Make file extent relocate code subpage bloc…
…ksize aware The file extent relocation code currently assumes blocksize to be same as PAGE_CACHE_SIZE. This commit adds code to support subpage blocksize scenario. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: btrfs_clone: Flush dirty blocks of a page that do not map the …
…clone range After cloning the required extents, we truncate all the pages that map the file range being cloned. In subpage-blocksize scenario, we could have dirty blocks before and/or after the clone range in the leading/trailing pages. Truncating these pages would lead to data loss. Hence this commit forces such dirty blocks to be flushed to disk before performing the clone operation. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Enable dedupe ioctl
The function implementing the dedupe ioctl i.e. btrfs_ioctl_file_extent_same(), returns with an error in subpage-blocksize scenario. This was done due to the fact that Btrfs did not have code to deal with block size < page size. This commit removes this restriction since we now support "block size < page size". Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: extent_clear_unlock_delalloc: Prevent page …
…from being unlocked more than once extent_clear_unlock_delalloc() can unlock a page more than once as shown below (assume 4k as the block size and 64k as the page size). cow_file_range create 4k ordered extent corresponding to page offsets 0 - 4095 extent_clear_unlock_delalloc corresponding to page offsets 0 - 4095 unlock page create 4k ordered extent corresponding to page offsets 4096 - 8191 extent_clear_unlock_delalloc corresponding to page offsets 4096 - 8191 unlock page To prevent such a scenario this commit passes "delalloc end" to extent_clear_unlock_delalloc() to help decide whether the page can be unlocked or not. NOTE: Since extent_clear_unlock_delalloc() is used by compression code as well, the commit passes ordered extent "end" as the value for the argument corresponding to "delalloc end" for invocations made from compression code path. This will be fixed by a future commit that gets compression to work in subpage-blocksize scenario. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> -
Btrfs: subpage-blocksize: Fix file defragmentation code
This commit gets file defragmentation code to work in subpage-blocksize scenario. It does this by keeping track of page offsets that mark block boundaries and passing them as arguments to the functions that implement the defragmentation logic. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Revert "btrfs: fix lockups from btrfs_clear_path_blocking"
The patch "Btrfs: subpage-blocksize: Prevent writes to an extent buffer when PG_writeback flag is set" requires btrfs_try_tree_write_lock() to be a true try lock w.r.t to both spinning and blocking locks. During 2015's Vault Conference Btrfs meetup, Chris Mason had suggested that he will write up a suitable locking function to be used when writing dirty pages that map metadata blocks. Until we have a suitable locking function available, this patch temporarily disables the commit f82c458.
-
Btrfs: subpage-blocksize: Prevent writes to an extent buffer when PG_…
…writeback flag is set In non-subpage-blocksize scenario, BTRFS_HEADER_FLAG_WRITTEN flag prevents Btrfs code from writing into an extent buffer whose pages are under writeback. This facility isn't sufficient for achieving the same in subpage-blocksize scenario, since we have more than one extent buffer mapped to a page. Hence this patch adds a new flag (i.e. EXTENT_BUFFER_HEAD_WRITEBACK) and corresponding code to track the writeback status of the page and to prevent writes to any of the extent buffers mapped to the page while writeback is going on. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: btrfs_punch_hole: Fix uptodate blocks check
In case of subpage-blocksize, the file blocks to be punched may map only part of a page. For file blocks inside such pages, we need to check for the presence of BLK_STATE_UPTODATE flag. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Explicitly track I/O status of blocks of an…
… ordered extent. In subpage-blocksize scenario a page can have more than one block. So in addition to PagePrivate2 flag, we would have to track the I/O status of each block of a page to reliably mark the ordered extent as complete. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Deal with partial ordered extent allocations.
In subpage-blocksize scenario, extent allocations for only some of the dirty blocks of a page can succeed, while allocation for rest of the blocks can fail. This patch allows I/O against such pages to be submitted. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Allow mounting filesystems where sectorsize…
… < PAGE_SIZE This patch allows mounting filesystems with sectorsize smaller than the PAGE_SIZE. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Write only dirty extent buffers belonging t…
…o a page For the subpage-blocksize scenario, this patch adds the ability to write a single extent buffer to the disk. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Read tree blocks whose size is < PAGE_CACHE…
…_SIZE In the case of subpage-blocksize, this patch makes it possible to read only a single metadata block from the disk instead of all the metadata blocks that map into a page. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Define extent_buffer_head.
In order to handle multiple extent buffers per page, first we need to create a way to handle all the extent buffers that are attached to a page. This patch creates a new data structure 'struct extent_buffer_head', and moves fields that are common to all extent buffers from 'struct extent_buffer' to 'struct extent_buffer_head' Also, this patch moves EXTENT_BUFFER_TREE_REF, EXTENT_BUFFER_DUMMY and EXTENT_BUFFER_IN_TREE flags from extent_buffer->ebflags to extent_buffer_head->bflags. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Make sure delalloc range intersects with th…
…e locked page's range find_delalloc_range indirectly depends on EXTENT_UPTODDATE to make sure that the delalloc range returned intersects with the file range mapped by the page. Since we now track "uptodate" state in a per-page bitmap (i.e. in btrfs_page_private->bstate), this commit makes an explicit check to make sure that the delalloc range starts from within the file range mapped by the page. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Fix whole page write
For the subpage-blocksize scenario, a page can contain multiple blocks. In such cases, this patch handles writing data to files. Also, When setting EXTENT_DELALLOC, we no longer set EXTENT_UPTODATE bit on the extent_io_tree since uptodate status is being tracked by the bitmap pointed to by page->private. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
-
Btrfs: subpage-blocksize: Fix whole page read.
For the subpage-blocksize scenario, a page can contain multiple blocks. In such cases, this patch handles reading data from files. To track the status of individual blocks of a page, this patch makes use of a bitmap pointed to by the newly introduced per-page 'struct btrfs_page_private'. The per-page btrfs_page_private->io_lock plays the same role as BH_Uptodate_Lock (see end_buffer_async_read()) i.e. without the io_lock we may end up in the following situation, NOTE: Assume 64k page size and 4k block size. Also assume that the first 12 blocks of the page are contiguous while the next 4 blocks are contiguous. When reading the page we end up submitting two "logical address space" bios. So end_bio_extent_readpage function is invoked twice, once for each bio. |-------------------------+-------------------------+-------------| | Task A | Task B | Task C | |-------------------------+-------------------------+-------------| | end_bio_extent_readpage | | | | process block 0 | | | | - clear BLK_STATE_IO | | | | - page_read_complete | | | | process block 1 | | | | | | | | | | | | | end_bio_extent_readpage | | | | process block 0 | | | | - clear BLK_STATE_IO | | | | - page_read_complete | | | | process block 1 | | | | | | | process block 11 | process block 3 | | | - clear BLK_STATE_IO | - clear BLK_STATE_IO | | | - page_read_complete | - page_read_complete | | | - returns true | - returns true | | | - unlock_page() | | | | | | lock_page() | | | - unlock_page() | | |-------------------------+-------------------------+-------------| We end up incorrectly unlocking the page twice and "Task C" ends up working on an unlocked page. So private->io_lock makes sure that only one of the tasks gets "true" as the return value when page_io_complete() is invoked. As an optimization the patch gets the io_lock only when the last block of the bio_vec is being processed. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Commits on Apr 13, 2016
-
Merge tag 'perf-core-for-mingo-20160413' of git://git.kernel.org/pub/…
…scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Print callchains asked for events requested via 'perf trace --event' too: (Arnaldo Carvalho de Melo) # trace -e nanosleep --call dwarf --event sched:sched_switch/call-graph=fp/ usleep 1 0.346 (0.005 ms): usleep/24428 nanosleep(rqtp: 0x7fffa15a0540) ... 0.346 ( ): sched:sched_switch:usleep:24428 [120] S ==> swapper/3:0 [120]) __schedule+0xfe200402 ([kernel.kallsyms]) schedule+0xfe200035 ([kernel.kallsyms]) do_nanosleep+0xfe20006f ([kernel.kallsyms]) hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms]) sys_nanosleep+0xfe20007a ([kernel.kallsyms]) do_syscall_64+0xfe200062 ([kernel.kallsyms]) return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms]) __nanosleep+0xffff005b8d602010 (/usr/lib64/libc-2.22.so) 0.400 (0.059 ms): usleep/24428 ... [continued]: nanosleep()) = 0 __nanosleep+0x10 (/usr/lib64/libc-2.22.so) usleep+0x34 (/usr/lib64/libc-2.22.so) main+0x1eb (/usr/bin/usleep) __libc_start_main+0xf0 (/usr/lib64/libc-2.22.so) _start+0x29 (/usr/bin/usleep) - Allow requesting that some CPUs or PIDs be highlighted in 'perf sched map' (Jiri Olsa) - Compact 'perf sched map' to show just CPUs with activity, improving the output in high core count systems (Jiri Olsa) - Fix segfault with 'perf trace --no-syscalls -e syscall-names' by bailing out such request, doesn't make sense to ask for no syscalls and then specify which ones should be printed (Arnaldo Carvalho de Melo) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>Ingo Molnar committedApr 13, 2016 -
perf trace: Do not accept --no-syscalls together with -e
Doesn't make sense and was causing a segfault, fix it. # trace -e clone --no-syscalls --event sched:*exec firefox The -e option can't be used with --no-syscalls. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ccrahezikdk2uebptzr1eyyi@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo committedApr 13, 2016 -
perf evsel: Move some methods from session.[ch] to evsel.[ch]
Those were converted to be evsel methods long ago, move the source to where it belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-vja8rjmkw3gd5ungaeyb5s2j@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo committedApr 13, 2016 -
perf sched map: Display only given cpus
Introducing --cpus option that will display only given cpus. Could be used together with color-cpus option. $ perf sched map --cpus 0,1 *A0 309999.786924 secs A0 => rcu_sched:7 *. 309999.786930 secs *B0 . 309999.786931 secs B0 => rcuos/2:25 B0 *A0 309999.786947 secs Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-9-git-send-email-jolsa@kernel.org [ Added entry to man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -
perf sched map: Color given cpus
Adding --color-cpus option to display selected cpus with background color (red by default). It helps on navigating through the perf sched map output. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-8-git-send-email-jolsa@kernel.org [ Added entry to man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
perf sched map: Color given pids
Adding --color-pids option to display selected pids in color (blue by default). It helps on navigating through the 'perf sched map' output. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-7-git-send-email-jolsa@kernel.org [ Added entry to man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
perf thread_map: Make new_by_tid_str constructor public
It will be used in following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
perf sched: Use color_fprintf for output
As preparation for next patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-5-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
perf sched: Add compact display option
Add compact map display that does not output the whole cpu matrix, only cpus that got event. $ perf sched map --compact *A0 1082427.094098 secs A0 => perf:19404 (CPU 2) A0 *. 1082427.094127 secs . => swapper:0 (CPU 1) A0 . *B0 1082427.094174 secs B0 => rcuos/2:25 (CPU 3) A0 . *. 1082427.094177 secs *C0 . . 1082427.094187 secs C0 => migration/2:21 C0 *A0 . 1082427.094193 secs *. A0 . 1082427.094195 secs *D0 A0 . 1082427.094402 secs D0 => rngd:968 *. A0 . 1082427.094406 secs . *E0 . 1082427.095221 secs E0 => kworker/1:1:5333 . E0 *F0 1082427.095227 secs F0 => xterm:3342 It helps to display sane output for small thread loads on big cpu servers. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-4-git-send-email-jolsa@kernel.org [ Add entry in 'perf sched' man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -
perf cpu_map: Add has() method
Adding cpu_map__has() to return bool of cpu presence in cpus map. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
perf thread_map: Add has() method
Adding thread_map__has() to return bool of pid presence in threads map. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
perf trace: Support callchains for --event too
We already were able to ask for callchains for a specific event: # trace -e nanosleep --call dwarf --event sched:sched_switch/call-graph=fp/ usleep 1 This would enable tracing just the "nanosleep" syscall, with callchains at syscall exit and would ask the kernel for frame pointer callchains to be enabled for the "sched:sched_switch" tracepoint event, its just that we were not resolving the callchain and printing it in 'perf trace', do it: # trace -e nanosleep --call dwarf --event sched:sched_switch/call-graph=fp/ usleep 1 0.425 ( 0.013 ms): usleep/6718 nanosleep(rqtp: 0x7ffcc1d16e20) ... 0.425 ( ): sched:sched_switch:usleep:6718 [120] S ==> swapper/2:0 [120]) __schedule+0xfe200402 ([kernel.kallsyms]) schedule+0xfe200035 ([kernel.kallsyms]) do_nanosleep+0xfe20006f ([kernel.kallsyms]) hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms]) sys_nanosleep+0xfe20007a ([kernel.kallsyms]) do_syscall_64+0xfe200062 ([kernel.kallsyms]) return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms]) __nanosleep+0xffff008b8cbe2010 (/usr/lib64/libc-2.22.so) 0.486 ( 0.073 ms): usleep/6718 ... [continued]: nanosleep()) = 0 __nanosleep+0x10 (/usr/lib64/libc-2.22.so) usleep+0x34 (/usr/lib64/libc-2.22.so) main+0x1eb (/usr/bin/usleep) __libc_start_main+0xf0 (/usr/lib64/libc-2.22.so) _start+0x29 (/usr/bin/usleep) # Pretty compact, huh? DWARF callchains for raw_syscalls:sys_exit + frame pointer callchains for a tracepoint, if your hardware supports LBR, go wild with /call-graph=lbr/, guess the next step is to lift this from 'perf script': -F, --fields <str> comma separated output fields prepend with 'type:'. Valid types: hw,sw,trace,raw. Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,period,iregs,brstack,brstacksym,flags Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2e7yiv5hqdm8jywlmfivvx2v@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>Arnaldo Carvalho de Melo committedApr 13, 2016 -
perf/x86/amd/uncore: Do not register a task ctx for uncore PMUs
The new sanity check introduced by: 2665784 ("perf/core: Verify we have a single perf_hw_context PMU") ... triggered on the AMD uncore driver. Uncore PMUs are per node, they cannot have per-task counters. Fix it. Reported-by: Borislav Petkov <bp@suse.de> Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@redhat.com Cc: alexander.shishkin@linux.intel.com Cc: eranian@google.com Cc: jolsa@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: vincent.weaver@maine.edu Link: http://lkml.kernel.org/r/20160404140208.GA3448@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra authored and Ingo Molnar committedApr 13, 2016 -
perf/x86/intel/pt: Use boot_cpu_has() because it's there
At the moment, initialization path is using test_cpu_cap(&boot_cpu_data), to detect PT, which is just open coding boot_cpu_has(). Use the latter instead. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/1459953307-14372-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
uprobes/x86: Constify uprobe_xol_ops structures
The uprobe_xol_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/1460200649-32526-1-git-send-email-Julia.Lawall@lip6.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Merge tag 'perf-core-for-mingo-20160411' of git://git.kernel.org/pub/…
…scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements from Arnaldo Carvalho de Melo: User visible changes: - Automagically create a 'bpf-output' event, easing the setup of BPF C "scripts" that produce output via the perf ring buffer. Now it is just a matter of calling any perf tool, such as 'trace', with a C source file that references the __bpf_stdout__ output channel and that channel will be created and connected to the script: # trace -e nanosleep --event test_bpf_stdout.c usleep 1 0.013 ( 0.013 ms): usleep/2818 nanosleep(rqtp: 0x7ffcead45f40 ) ... 0.013 ( ): __bpf_stdout__:Raise a BPF event!..) 0.015 ( ): perf_bpf_probe:func_begin:(ffffffff81112460)) 0.261 ( ): __bpf_stdout__:Raise a BPF event!..) 0.262 ( ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92)) 0.264 ( 0.264 ms): usleep/2818 ... [continued]: nanosleep()) = 0 # Further work is needed to reduce the number of lines in a perf bpf C source file, this being the part where we greatly reduce the command line setup (Wang Nan) - 'perf trace' now supports callchains, with 'trace --call-graph dwarf' using libunwind, just like 'perf top', to ask the kernel for stack dumps for CFI processing. This reduces the overhead by asking just for userspace callchains and also only for the syscall exit tracepoint (raw_syscalls:sys_exit) (Milian Wolff, Arnaldo Carvalho de Melo) Try it with, for instance: # perf trace --call dwarf ping 127.0.0.1 An excerpt of a system wide 'perf trace --call dwarf" session is at: https://fedorapeople.org/~acme/perf/perf-trace--call-graph-dwarf--all-cpus.txt You may need to bump the number of mmap pages, using -m/--mmap-pages, but on a Broadwell machine the defaults allowed system wide tracing to work without losing that many records, experiment with just some syscalls, like: # perf trace --call dwarf -e nanosleep,futex All the targets available for 'perf record', 'perf top' (--pid, --tid, --cpu, etc) should work. Also --duration may be interesting to try. To get filenames from in various syscalls pointer args (open, ettc), add this to the mix: # perf probe 'vfs_getname=getname_flags:72 pathname=filename:string' Making this work is next in line: # trace --call dwarf --ev sched:sched_switch/call-graph=fp/ usleep 1 I.e. honouring per-tracepoint callchains in 'perf trace' in addition to in raw_syscalls:sys_exit. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>Ingo Molnar committedApr 13, 2016 -
Merge tag 'perf-core-for-mingo-20160408' of git://git.kernel.org/pub/…
…scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Beautify more syscall arguments in 'perf trace', using the type column in tracepoint /format fields to attach, for instance, a pid_t resolver to the thread COMM, also attach a mode_t beautifier in the same fashion (Arnaldo Carvalho de Melo) - Build the syscall table id <-> name resolver using the same .tbl file used in the kernel to generate headers, to avoid the delay in getting new syscalls supported in the audit-libs external dependency, done so far only for x86_64 (Arnaldo Carvalho de Melo) - Improve the documentation of event specifications (Andi Kleen) - Process update events in 'perf script', fixing up this use case: # perf stat -a -I 1000 -e cycles record | perf script -s script.py - Shared object symbol adjustment fixes, fixing symbol resolution in Android (Wang Nan) Infrastructure changes: - Add dedicated unwind addr_space member into thread struct, to allow tools to use thread->priv, noticed while working on having callchains in 'perf trace' (Jiri Olsa) Build fixes: - Fix the build in Ubuntu 12.04 (Arnaldo Carvalho de Melo, Vinson Lee) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>Ingo Molnar committedApr 13, 2016